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This is a method of quickly canqmting the power dissipated by a digital circuit 
using infonnation available at the gate library level. It estimates the short-ciicuit 
power by modelbig the energy dissipated by the cell per mput transition as a function 
of the transitiCHi time or edge rate, and multq)lying that value by the numbo: of 
transitions per second for that iaput 
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METHOD AND APPARATUS FOR ESTIMATING THE POWER 
DISSIPATED BY A DIGITAL CIRCUIT 

5 



10 



15 Field of Invention 

This invention is related to the field of designing digital circuits. In 
particular, this invention is related to estimating the power that would be 
dissipated by a digital circuit. 

20 Description of the Related Art 

Power as a Factor in Digital Design 

With thie advent of portable implications sudi as notebook computers, 
cellular phones, palm-top computers etc., there is a growing emphasis in the 
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hardware design community for Computer Aided Design (CAD) tools for low 
power IC design. Today, the predominant differentiator of portable 
applications in the marketplace is their "battery life" not their performance. 
Even designers of high performance ICs are expressing a need for such tools 
5 because clocks are running faster, chips are getting denser and packaging and 
thermal control are playing a dominant role in determining the cost of such 
ICs. The cost of upgrading from a plastic packaging, which typically can 
handle peak power dissipation of approximately 1 Watt, to a ceramic 
packaging, which has lower thermal resistivity, can be roughly a tenfold 
10 increase in cost. 

Managing Power in a Typical Digital Design How 
An important part of minimizing power dissipated by a system is 
reducing the power dissipated by die chips in the system. Because fabricating 

15 chips is expensive and time consimiing, a chip designer often uses CAD tools 
to estimate the power dissipation of a particutar design before actually 
fabricating the chip in silicon. From this power estimate the designer can 
modify the design before fabrication to reduce die power dissipation. 
However, the conventional method of estimating power at the design phase 

20 has its own problems. Figure 1 is a flow diagram illustrating a conventional 
design used by a designer to reduce the power dissipated on a chip. 

A general description of the process and techniques used to design and 
analyze digital designs can be found in the Principles of CMOS VLSI Desi|gn 
by Neil H.E, Weste and Kamran Eshraghian, published in 1992 by Addison- 

25 Wesley Publishmg Company, ISBN 0-201-53376-6, which is hereby 
incorporated by reference. Another overview of the design process can be 
found in U.S. Patent Application 08/226,147 entitled "Hardware Description 
Language Source Debugger" by Gregory, et al. filed on April 12, 1994, 
which is hereby mcorporated by reference. Another overview of the design 
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process can be found in co-pending U.S. Application 08/253,470 entitied 
"Architecture and Methods for A Hardware Description Language Source 
Level Debugging System", filed on June 3, 1994, which is hereby 
incorporated by reference. In Figure 1, the general design flow begins with 

5 a semiconductor vendor constructing a library of cells, as shown in step 1000. 
These cells perform various combinational and sequential functions. The 
semiconductor vendor, with the help of CAD tools, characterizes the electrical 
behavior of those cells. For example, the vendor provides estimates of the 
delay through each cell and how much substrate area the cells will occupy. 

10 This establishes a library of components that a designer can use to build a 
complex chip. 

Recently, semiconductor vendors have also started characterizing the 
power dissq)ation of the library cells as a single static value. However, the 
power dissipation of a cell is a complex function of the loading on the cell's 

15 output(s), toggle rates of the cell's inputs and ou^uts, and transition times of 
the cell's iiqnits. Without a model that allows them to captuie tiie dependence 
of the cell's power on those three principal factors, semiconductor vendors 
have instead resorted to characterizing a single static value normally in units 
of Joules per KHz). Because this model ignores all of the key factors that 

20 influence power dissipation, it's results are only utilized as very rough 
estimates. In step 1010, the designer specifies the functional details of the 
design. One method that the designer can use to describe the design is to 
write a synthesis source description in a Hardware Description Language 
(HDL). The designer could also describe the design with a schematic capture 

25 tool bypassmg steps 1010 and 1020. 

In step 1020, the CAD system creates a networic of gates that 
implement the function specified by the designer in step 1010. This is 
commonly referred to as the synthesis step. Importandy, at this step, the 
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CAD system has information about which cells are going to be used and how 
the cells will be connected to each other. 

In step 1030, the CAD system determines where the cells identified in 
step 1020 will be placed on the chip substrate, and how the connections 
between the cells wiU be routed on the substrate. This is commonly referred 
to as the layout or "Place &. Route" step. This step establishes the physical 
layout of the chip. Ordmarily, it requires a significant amount of computation 
time. 

In step 1035, the CAD system extracts a transistor level nedist for the 
design from the layout. 

In step 1040, the CAD system estimates the power used by the chip 
from the netlist extracted in step 1035. This is done by applying a 
representative set of input stimuli to a simulation model derived from the 
netiist. Constructing the input stimuli and simulating the stimuli requires a 
significant amount of computation time. This detailed simulation, however, 
can produce an accurate estimate of die power tiiat tiie final chip will 
dissipate. The accuracy of tiie estimates depends on how representative die 
input stimuli set is compared to tiie actual operation of tiie design. Sometimes, 
tiic stimuli set is selected for purposes of functional testing of flie design in 
which case tiie stimuli set will not be representative of tiie normal operation 
ofthedesigiL 

In step 1050, tiie designer determines wheflier tiie power dissipated !^ 
tiie chip is sufficientiy low to meet tiie designer's needs witii respect to battery 
life and tiie package used. If not, tiie designer modifies tiie design in step 
1060, and repeats steps 1020, 1030. and 1040. If flie power dissipation is 
witiun bounds, and tiie design meets all oflier requirements, tiie chip is 
fabricated in step 1070. 
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Limitations of Existing Power Estimation Methods 
The general design flow of Figure 1 presents several obstacles to a 
designer seelcing insight about the power dissipated by the design. Steps 
1030, 1035, and 1040 are time consuming because they involve constructing 
5 layout information and simulating the design. A designer concerned about 
power dissipation may have to iterate through the loop indicated by steps 
1020, 1030, 1033, 1040, and 1050 several times to obtam an acceptable 
result. This can substantially delay the development of a chip. Alternatively, 
because of the perceived development delay, the designer may be forced to 
10 proceed with a design that may not necessarily meet the specified power 
budget or that may dissipate power unnecessarily. 

A power estimation method that doesn't rely on layout information and 
that doesn't require uq)ut stimuli to be simulated would allow designers to 
more easily understand and manage their power problems earlier in the design ' 
15 flow and in a more cost-effective manner. This is similar to problems in the 
timing of digital designs. Until recendy, designers usually simulated Aeir 
designs to understand if there were any timing problems in the design. In the 
last several years, however, static timing analysis has been adopted by many 
digital designers as a fast and accurate rephicement for timing simulation. 
20 Static timing analysis predicts the timing problems in a design without 
performing any dynamic simulation of the design. 

Several journal articles and conference papers have described methods 
of performing a similar static power analysis to estimate the dynamic power 
of combinational designs. These iiK:lude the following which are hereby 
25 ,iv - incorporated by reference: 

1) Estimating Power Dissipation in VLSI Circuits by F. Najm, 
IFPP Circuits and Devices Magazine, Vol 10, Issue 4, pp. 
11-19. July. 1994. 



PCrAJS95/0704D 
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2) Estimation of Average Switching Activity in Combinational 
and Sequential Circuits, by A. Ghosh, S. Devadas, 
K. Keutzer, and J. White, 29th ACM/IEEE Design Automation 
Conference, pp. 253-259, June 1992. 
5 3) Transition Density, a Stochastic measure of activity in 

digital circuits, by F. Najm, 28th ACM/IEEE Design 
Automation Conference, pp. 644-649, June 1991. 

4) Efflcient estimation of dynamic power consumption under 
a real delay model, by C-Y. Tsui, M. Pedram, and A, M. 

10 Despain, IEEE International Conference on Computer-Aided 

Design, pp. 224-228, November, 1993. 

5) On Average Power Dissipation and Random Pattern 
Testability of CMOS Combinational Logic Networks, by 
A. Shen, A. Ghosh, S. Devadas, and K. Keutzer, IEEE/ACM 

15 International Conference on Computer-Aided Designs, 

pp. 402-407, November, 1992. 

6) Estimating Dynamic Power Consumption of CMOS Circuits, 
by M,A. Cirit, IEEE International Conference on 
Computer-Aided Design, pp. 534-537, November, 1987. 

20 In addition, there are other articles and papers that descril}e power 

estimation techniques that are similar to one or more of the above ps^rs. 
However, the approaches described in these all of these papers and articles 
focus on purely combinational designs with a manageable number of cells, and 
diey all use simplified models for power dissipation. Consequently the 

25 applicability of the above approaches is limited to small combinadonal designs 
that contain no sequential elements (flq>-flops, latches, or memory 
components). 

Estimation of Switching Activity in sequential circuits with 
applications to synthesis for low power, by J. Monteiro, S. Devadas, and 
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B. Lin, in the 31st ACM/IEEE Design Automation Conference, pp, 12-17, 
1994, describes extensions to the original combinational propagation methods 
to allow those techniques to operate on designs that contain sequential 
elements, and it is hereby incorporated by reference. However, this paj^r 

5 utilizes very simplified models of sequential elements allowing it to only 
operate on simple D-type flip-flops without any asynchronous inp}xt& or clock- 
gating signals. Moreover, like the earlier combinational propagation 
techniques, they also used a sunplified power model that ignores all but net 
switching power dissipation. Finally, the overall strategy that they described 

10 for processing designs requires significant computation time and can only 
work on relatively small designs. Limitations in the prior an point to a strong 
need for a power estimation method that can: 

1) robustly deal with a range of design styles including designs 
that contain a combination of combinational and sequential 

IS cells, pipelined designs, state machine designs, hybrid designs 

that contain a mix of pipelined structures and state machines, 
complex clocking schemes, gated clocks, and latchbased 
designs. 

2) process arbitrarily complex combinational logic 

20 3) efficiently model all of the principal types of power dissipation. 

Circuit Design Structure 

The basic functional element of a digital design is a transistor. As 
digital design has progressed, the level of abstraction has been raised to the 
25 gate- or cell-leveL A cell contains a collection of transistors connected into 
an electrical circuit that performs a combinational or sequential ftmction. A 
typical cell might implement a NAND function or act as a D flip-flop. A 
design consists of an interconnected collection of cells. A cell's inputs and 
outputs are referred to as pins. Generally, the interconnections between cells 
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are referred to as nets. The primary input and output interface ports of the 
design are the means by which external components can interact with the 
design. These ports will be referred to as the primary inputs and primary 
ou^uts of the design, respectively. 

Sometimes a cell performs a more complicated function, such as an 
AND-OR combination. In some situations, some of the internal connections 
within such a ceU need to be treated by the CAD tools as though those 
connections were nets, and were connecting different cells. For example, in 
an AND-OR cell, the connection between the AND component of the cell and 
the OR component of the cell may need to be treated as a net. 

Types of Power Dissipation 

There are three kinds of power dissipation in a digital CMOS circuit: 
leakage net switching power and cell internal power. Figure 2 shows a 
transistor level schematic of a CMOS inverter that will be used to illustrate the 
different types of power dissipation. For simplicity, input 1 can be in ok of 
four states: held at a high voltage; held at a low voltage; transitioning ftoini 
a high voltage to a low voltage; or transitioning from a low voltage to a high 
voltage. From a functional pomt of view, when mpm 1 is at a high voltage, 
transistor 2 is mmed off, smd transistor 6 is turned on pulling the voltage at 
output net 4 to the same potential as ground 7. When iiqiut 1 is at a low 
voltage, transistor 2 is turned on and transistor 6 is turned off pulling output 
net 4 to approximately the same potential as VDD 3. 

For improved accuracy, a power estimation method must model all 
three components of power dissipation. Existing power estimation methods 
tend to completely ignore the cell internal and leakage power. However, as 
was pointed out by Harry J.M. Veendrick in Short-Circuit Dissipation of 
Static CMOS Circuitry and its impact on the Design of Buffer Circuits in 
die IEEE Journal of Solid-State Circuits, Vol. SC-19, No. 4, 
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pp. 468-473 (August, 1984), which is hereby incorporated by reference, in 
some cases cell internal power can be as great as the net switching power. 
Leakage or Static Power Dissipation 

In both of the cell's steady states (Logic-0 and Lx)gic-1), a small 

5 leakage current flows from the gates source to it's drain. This is referred to 
as subthreshold leakage, and it is due to the fact that the gate is not 
completely shut off causing some current to flow from VDD through the gate 
to GND. In addition, leakage current can flow through the reverse-biased 
junction between the diffusion and substrate layers. These leakage currents 

10 cause leakage power. 

Leakage power is also referred to as static power because leakage 
power is dissipated the time regardless whether the circuit is active or not. 
That is a cell will always have a small amount of leakage current whether the 
cell's output is transitioning or stable. For some gates, the leakage current 

15 may be so minimal that it can be effectively ignored. 

The total leakage power dissipated in a design is the sum of the 
leakage pow^ for all cells in the design. 

Dynamic Power Dissipation 

20 In contrast to static power, dynamic power is only dissipated when the 

circuit is active. That is a cell only consumes dynamic power if the cell's 
ouq)uts (or internal nodes) are transitioning from one voltage level to another. 
For example, in Figure 2, the cell will dissipate dynamic power when input 
1 is making a transition. 

25 The two principal types of dynaniic power are net switchmg power (or 

simply switching power) and cell internal power (or simply internal power). 
The total switchmg power dissipated in a design is the sum of the switching 
power for all nets in the design. The total internal power dissipated in a 
design is the sum of the internal power for all cells in the design. 
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Net Switching Power Dissipation 

In Figure 2. output net 4 behaves electrically as though there were a 
capacitor connecting it to ground. This capacitive effect is modeled with 
capacitor 5. Net switching power results ftom the current that flows to charge 
or discharge capacitor 5. For example, during the period where the iiq)ut 1 
transitions from a high voltage to a low voltage, transistor 2 acts as a resistor. 
Transistor 2 and capacitor 5 act as an RC circuit that evenmally puts a high 
voltage at output net 4. The amount of energy dissipated during a single 
transition is given by V4C1^ where C represents the capacitance of capacitor 
5 and V is the voltage at VDD 3. The capacitance, C, is determmed 
primarily by the wiring connections between cells and the input capacitance 
of loads on the net. C is therefore a function of what the cell is connected to, 
and can be estimated from libraries and the gate level design at step 1020, 
This would use the wire load model in the library. Alternatively, C can be 
obtained usmg back annotation from extracted layout data, A reasonable 
estimate of the switching power dissipated is therefore the number of 
transitions per second times the energy dissipated per transition. 

Cell Internal Power Dissipation 

During a transition, both transistor 1 and transistor 2 arc turned on, 
and behave as non-linear resistors. This creates a current flow from VDD 3 
to ground 7. Cell internal power dissipation is caused by this current flow. 
Internal power also accounts for current dissipated in the chargmg or 
discharging of any capacitances that are mtemal to the cell. For example, a 
sequential cell consumes internal power durii^g the charging and discharging 
of capacitances at nodes of the internal clock tree whenever the clock signal 
transitions. 
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Estimating the Number of Transitions for every Net 

As described above, one way to estimate the switching power 
dissipated at a net is to compute the energy dissipated per transition at that 
net, and multiply it by the number of transitions expected per second at that 
5 net. The number of transitions per second is referred to as the toggle rate, 
transition density, or activity factor of that net. Depending on the complexity 
of the design, estimating a net's toggle rate can be a computationally 
expensive task. 

One method for computing-the toggle rate associated with a net is to 
10 develop stimuli and simulate the entire design. During the simulation, the 
simulator keeps track of the number of transitions occurring at each net. 
Dividing the transition count of a net by the simulated time provides an 
estimate for the toggle rate of that net However, this approach requires a 
substantial amount of computation to allow conq)lete simulation of the circuit. 
15 The following papers describe various simulation-based analysis methods, and 
. they are hereby incorporated by reference: 

1) Accurate Simulation of Power Dissipation in VLSI Circuits 
by S. M. Kang, IEEE Journal of Solid-State Circuits, vol. 
SC-21, no.5. pp. 889-891. Oct. 1986. 
20 2) An Accurate Simulation Technique for Short-Circuit Power 

Dissipation based on Current Component Isolation, by 
G, Y. Yacoub and W.H. Ku, IEEE International Synq)Osram 
on Circuits and Systems, pp. 1157-1161, 1989. 
3) McPOWER: A Monte Carlo Approach to Power Estimation, by 
25 R. Burch, F. Najm, P. Yang. andx^T. Trick, IEEE/ACM 

International Conference on Computer-Aided Designs, 
90-97, November, 1992. 
Another iiiediod for estimating the number of transitions at each point 
in a combinational logic circuit relies on a static analysis of the circuit. A 
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combinational logic is composed of cells connected together by nets without 
any feedback. The inputs to the entire combinational logic circuit are referred 
to as primary inputs while the final outputs of the entire combinational logic 
circuit are referred to as primary outputs. The nets between ceUs arc refeiied 
5 to as internal nets of the design. One method estimating the toggle rates at 
each net in the combinational logic circuit involves assigning static 
probabDities and toggle rates to each primary input, and computing the toggle 
rates at other places in the design as a function of the static probability and 
toggle rate values of the primary inputs. 

^® The static probability of a particular net or mpnt in a circuit is the 

probability that the net will be at the value of Logic-1 at any point in time. 
Physically, the static probability represents the fraction of time that the net 
will hold the value of VDD. 

This method involves computing and storing a representation of the 

15 Boolean logic function at each internal node in the circuit. Or» of flie 
problems of this approach is tiiat the functional representation may consume 
large amounts of memory for combinational logic circuits. In addition, this 
metiiod has not been applied to circuits containing sequential elements. 

20 Background Summary 

Power dissipation in an integrated circuit presents an nnportant design 
consideration. Estimating the power dissipated by a design involves 
considerations of computation time and accuracy. Conventional circuit power 
estimation techniques have involved cvaluatiog circuits tfiat have been 

25 specified to the layout or transistor level This requires a substantial amount 
of computation time to analyze the design at this level. 

Conventional circuit power estimation techniques have also involved 
sunulation. The power estunate obtained from simulation requires computation 
time proportional to die number of test patterns used. The utility of the power 



95/34036 



PCT/US95/07040 



-13- 1 

estimate obtained from simulation also depends on the test patterns used. If 
the test patterns do not represent typical conditions, then the power estimate 
will not provide meaningful guidance to a designer. 

Existing power estimates which are not based on simulation are faster 
than those which are. However, they only apply to a limited class of circuits, 
namely combinational logic. This greatiy limits the use of this type of 
technique. 

Existing power estimation techniques rely on a simple model of the 
power dissipated by a cell. Such models ignore leakage and cell internal 
power. Ignoring these effects reduces the accuracy of the estimate. 

SUMMARY OF THE INVENTION 
One aspea of the present invention provides a designer with a fast 
method of estimating the power dissipated by a circuit. The method reduces 
the time requu^ to get an estimate of a designs power, because the design 
does not need to be mapped to the layout level, and instead uses information 
available at the gate level. The method avoids the requurement of gate level 
simulation by estimating the probabilities and the toggle rate at all nodes in the 
circuit, utilizing static probability and toggle rate values inputs of the circuit. 
Thus, this method returns a power estimate in less cpu time than eariier 
approaches. 

Another aspea of the present invention provides a method of 
estimating the toggle rates in a circuit containing sequential elements 
(flip-flops). This is acconq}lished by constructing a state element graph for the 
curcuit, breakmg cycles in the graph, computing the toggle rate in die 
combinational logic using the levels in the state element graphs and 
transferring the toggle rates and probabilities across sequential elements. 
Transferring the toggle rates and probabilities across sequential elements is 



wo 95/34036 



PCTAJS9S/07040 



-14- 



10 



15 



achieved by modeling any conventional sequential element as a generic 
sequential element with additional combinational logic. 

To enable handling of large circuits, a memory blow-up strategy has 
been developed. Large circuits require large amount of memory to represent 
their logic functions. This issue is addressed by approximating the static 
probabiUties at local inputs when computational problems are detected. This 
strategy achieves good accuracy of power estimates whUe limiting memoiy use 
and execution time. 

An aspect of the present invention provides for improved accuracy and 
fast computation in estimating the internal power dissipated by a ceU. This 
is achieved by a model which characterizes the power dissipated by the ceU 
during an output transition. The model is a function of the edge rate <or 
transition time of flie inputs to a ceU) and the ou^t capacitive loading of the 
cell output. This power model of a cell reduces the time lequired to estimate 
dissipated power, and represents a substantial improvement over previous 
transistor level sunulation methods. 



BRIEF DESCRIPTION OF THE FIGURES 
Figure 1 shows the conventional design process for a designer to 
20 analyze and evaluate a design for power dissipation. 
Figure 2 shows a CMOS inverter. 

Figure 3 shows an improved design process for a designer to analyze 
and evaluate a design for power dissipation. 

Figure 4 shows a mettiod of computing the stationary probabiUties and 
25 activity factors for a combinational logic circuit. 

Figure 5 shows a method for computing tiie stationary probabflities and 
activity for a circuit containing sequential elements. 

Figure 6 shows a simple design containing combinational and 
sequential cells with nets connecting the gates. 
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Figure 7 shows a sample State Element Graph (SEG) 
Figure 8 shows a Modified State Element Graph that is created after 
all cycles in SEG are broken* 

Figure 9 shows a interpolation into a 2 dimensional lookup table for 
5 cell internal power. 

DETAILED DESCRIFnON OF THE PREFERRED EMBODIMENT 
The present invention comprises a novel method and apparatus for 
quickly estimating the power in a digital circuit. The foUowmg description 

10 is presented to enable any person skilled in the art to make and use the 
invention, and is provided in the context of particular application and its 
requirements. Various modifications to the preferred embodiment will be 
readily appaj^nt to^those skilled in the art, and the generic principles defined 
herein may be applied to other embodiments and applications without 

15 departing from the spirit and scope of the invention. Thus, the present 
invention is not intended to be limited to the embodiment shown* but is to be 
accorded the widest scope consistent with the principles and features disclosed 
herein. 

Figure 2 is a simplified block diagram illustrating a general purpose 
20 programmable computer system, generally indicated at 200, which may be 
used in conjunction with a first embodiment of the present invention. In the 
presently preferred embodiment, a Sun Microsystems SPARC Workstation is 
used. Of course, a wide varieQr of computer systems may be used, including 
without limitation, workstations running the UNIX system, IBM compatible 
25 personal conq)uter systems running the DOS operating system^^^nd the Apple 
Macintosh computer system running the .^>ple System 7 operating system. 
Figure 2 shows one of several common architectures for such a system. 
Referring to Figure 2, such computer systems may include a central 
processing imit (CPU) 202 for executing instructions and performing 
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calculations, a bus bridge 204 coupled to the CPU 202 by a local bus 206, a 
memoiy 208 for storing data and instructions coupled to the bus bridge 204 
by memory bus 210, a high speed input/output (I/O) bus 212 coupled to the 
bus bridge 204, and I/O devices 214 coupled to the high speed I/O bus 212. 
As is known in the art, the various buses provide for communication among 
system components. The I/O devices 214 preferably include a manually 
operated keyboard and a mouse or other selecting device for input, a CRT or 
other computer display monitor for output, and a disk drive or other storage 
device for non-volatile storage of data and program instructions. The 
operating system typically controls the above-identified components and 
provides a user interfece. The user interface is preferably a graphical user 
interface which includes windows and menus that may be controlled by the 
keyboard or selecting device. Of course, as will be readily apparera to one 
of ordinary skill in the ait, otiier computer systems and architectures are 
15 readfly adapted for use with embodunents of the present invention. 

Figure 3 shows a revised general design approach mcorporating the 
new estimation techniques. In step 1001, the semiconductor vendor and CAD 
tool supplier cooperate to produce ceU libraries much as was done in step 
1000 of Figure 1. However, in addition to die other characterization 
activities, tiie semiconductor vendor also estimates die internal . energy 
dissipated in a ceU as a function of the input edge rate and ou^ut load, and 
adds this infonnation to the ceU Ifcraiy descriptioa This power modeling 
information is suppUed to power analysis tool to provide for estimation of 
internal energy of the cell. 
25 The designer specifies flie design in step 1010 as was done in the 

process of Figure 1. The design is mapped to gates in step 1020 as it was 
done before. However, in 1041, tiie power dissipated by the design is 
estimated at tiie gate level using metiuxls described later. The CAD system 
uses conventional techniques to compute tiie transition times and capacitive 
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loads on each net. The remainder of the design process proceeds as it did in 
Figure 1. 

The revised design approach has distinct advantages oyer the previous 
approach. The previous approach (Figure 1), did not perform power analysis 
5 until the final stages of the design process. Power estimation is only done at 
the transistor level and requires more memory and execution time than the 
revised design approach. The revised approach can be used earlier in the 
design cycle which enables power estimates to be included in the design 
process at an earlier stage. 

10 

1.0 Power Estimation 

As described previously, there are three sources of power dissipation: 
leakage, switching, and internal. The total power dissipated by a design can 
be computed by summing up the power dissipated by each of these sources. 

15 

1.1 Leakage Power Estimation 

As previously described, leakage power represents the static or 
quiescent power dissipated. It is generally independent of switching activity. 
Thus, library developers can aimotate gates with the approxim^ total leakage 

20 power that is dissipated by the gate. Normally, leakage power is only a very 
small component of the total power ( < < 1 %), but it is inq)ortant to model for 
designs that are in an idle state most of the time circuits used for pagers and 
cellular phones are often idle. Leakage power will be specified by a single 
cell-level attribute in the library developed during step 1001 of Figure 3. The 

25 leakage power of each cell is summed over all cells in the design to yield the 
design's total leakage power dissipation. 
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1.2 Switching Power Estimation 

As previously described, switching power comes from the current that 
flows to or discharge the nets that connect cells. It occurs when the output of 
a cell transitions from one voltage level to another. Switching power for the 
5 entire design is the sum over aU nets in the circuit of the power dissipated on 
each net. The power dissipated on each net is the energy dissipated on a 
transition at that net toggle rate times the number of tiansitions per second at 
tiiat net. The energy dissipated in a transition is given ^C^^,VDD^ 
where C„ represents the capacitance of tiiat particular net. It can be 
10 computed rcadUy from the gate level libraries. The number of transitions per 
second is die toggle rate. A new method for computing the toggle rates wUl 
be described in a later section. 

13 Internal Power Estimation 
15 The total internal power dissipated in a design is die sum over aU cells 

in die design of die internal power dissipated in each cell. Part of the internal 
power dissipation of a cell arises from the momentary electrical connection 

between VDD and ground that occurs while an iiqiut is transitioning, and thus 
turning on the P and N transistors simultaneously. This is called short-circuit 
power. Another part of the internal power comes from die current that flows 
while charging and discharging die internal capacitance of die ceU. TTiis is 
called internal capacitive power. 

A new internal power model is defmcd to model energy which is 
consumed internal to die gate using input/output port characteristics. The 
25 representation of die model used here is a data strucmre in die RAM of a 
computer system operating a Computcr-Aided Design system. The model 
variables include: input edge rates, port toggle rates output load capacitance. 
Each pin of die library gate can be annotated widi an internal power table 
reference. The reference names a table of data values which represent internal 
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energy consumed due to a logic transition at that pin. The table can vary 
from 1) a single scalar value, 2) a vector of values indexed by weighted input 
transitions/ or 3) a 2 dimensional table indexed by weighted input transition 
times and output load capacitance. An energy value (Ej) is extracted from 

5 table by performing a linear interpolation from values extracted from adjacent 
table values as shown in Figure 9. The weighted transition time is 
computed by taking transition time T^ of each cell pin i, and weighing by pm's 
toggle rate Tr, using the following fonnulation: = ((£(1, x Tri))/(ETri)) 
Internal cell power for a given cell can be estimated as HEjTvj vliere 

10 Ej represents the enei^ dissipated due to a transition on signal j\ while Tvj 
represents the toggle rate on pin j. 

An important aspect of this model is that it models the variation of 
energy dissipated due to the variation of both input transition times and ou^ut 
load capacitances. If a signal transition takes a long time, then the P and N 

IS transistors are both on for a longer period of time, thus allowing more charge 
to flow and dissipate more energy. On the hand, a fast transition limits the 
amount of time that the P and N transistors ina cell can be on simultaneously. 
From information available in the fate libraries, a static timing analyzer 
(example: DesignTime from Synopsys, MOTIVE tool from Quad Design) can 

20 conq>ute the transition time Tj at each cell iqmt net i. 

Method for Determining Input Transition Hmes in Circuit NetUst 

Figure SFM33 is a drawing of a gate level netlist in which three 
different library cells are instantiated. The name shown for each cell shows 
25 the instance name, and the library cell name m parenthesis. The available 
technology library provides several different library cells which providie same 
logic ftmction (AN2), but provide different circuit implementations with 
different electrical characteristics. The library cells are stored in a function 
table indexed by the function type (i.e. NAND2), shown in Figure SFM34. 
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To compute transitions times, the circuit neUist is traversed in a breadth first 
fashion. At each cell instance the output transition tune is accessed based on 
the electrical characteristics of its attached pins, from a table of values for that 
library cell. For example in Figure SFM33, the first cell SI is traversed and 
the output net nl transition time is computed by accessing the transition time 
values for the input nets A and B. Next the second ceU S2 is traversed and 
the output net n2 transition time is similarly computed by accessmg the 
transition time values for the attached input nets C and D. Fmally, the first 
cell on the next level S3 is travers^ and its output net n3 transiUon time is 
computed by accessing the transition time values for the attached input nets 
nl and n2. 



Method for Determining Output Load Capacitance 

The output load capacitance of each net is determined using the 
15 foUowiiig pseudo code traversal: 



1000 for(Bnnetsincii6nitiMilict)( 

1000 owputJoBdjeapeO: 

1001 for (all attached pins of net)( 

[J^ ««I«««-lo«»_cap«o«iMt_load_cap + capof|»ii>: 

20 store oaipuUoa<l_cap on net data stnictme: 

1005 } ^ ■ 



TTie method works by traversing aU nets in the circuit netlist data 
stiucture, and then for each attached pin on the net. a capacitance value is 
25 accessed and added to a sum for that net. Once a total value for the net is 
calculated, it is stored onto a circuit netlist data structure. 
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1.4 Method for Reducing Circuit Power Dissipation 

The following describes how a power model can be used by a designer 
to minimized power dissipation in a gate level circuit. The following pseudo 
code describes a method by which a designer can reduce the power 
consumption of a design. 



1000 cumaLpowervooo^mte power of cxrcdtnedist; 

1000 for (cadi cell in ciiciut&edistM 

1001 for (each altcnuti ve libnry odl wUcb provkles tame fiuiciiOB)( 
lOOa cunem_Iibrvy.ceU nime ii uved; 

10 1003 iostuitiBle alternative lifaniyceU; 

1004 new jiower b csxapai power of dicuit netUst; 

1005 if (new_powcr>cinrBn|jiowcr) ( 

1006 rBmtin8tintiatBbacktocurxGaUifarary.ceB 

1007 ) 

1008 dte( 

1009 currencpower — iicw_j)owcr; 

1010 ); 

1011 1 
15 1012 1 



In this pseudo code, the designer is using the power estimation tool to 
20 evaluate alternative library cell instantiations in the circuit netlist to determine 
which instantiation provides the least power dissipation. After each 
instantiation of an alternative library cell, the designer uses the power 
estimation tool to compute the power dissipation of the entire circuit. At line 
1001, die library function data structure in figure SFM34 is accessed to find 
25 all the library cells which implement the same function as the original library 
cen. 
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1.5 Calculating CeU Energy as a Function of Edge Rates 

Today, semiconductor manufacturers provide Ubraries of standard cells 
that perform various functions to designers. Designers use CAD tools to 
select appropriate ceUs to construct a larger circuit. Some CAD tools use logic 
synthesis to select cells from the library. To evaluate the behavior of the 
resulting total design, the CAD tool detemunes the characteristics of the entire 
design from die particular characterisUcs of individual cells as weU as from 
the interactions of connected cells. To allow the CAD tool to perform this 
global analysis, the semiconductor vendor computes various characteristics of 
each cell and passes die results of diose computations along to the CAD tool 
vendor and to the designer. Analysis tools in die CAD tool suite use diis 
information to provide the designer widi information about die area, power 
and delay associated witii a particular design. 

Each ceU is specified as a geometric pattern of different layers of 
various materials. Each cell performs a particular logic function using the 
electrical circuits fonned from diese patterns. As part of pn)viding die library 
to die designer, die semiconductor vendor n)utinely use tools such as SPICE 
to detennine. for example, how long die cuxaiit takes to generate an output 
firom a given set of inputs. 

The semiconductor vendor (or tool user) provides a library of ceUs. 
widi characterization data for each die library cells. The characierization dat^ 
includes: 1) pm capacitance values, 2) imemal power model, 3) delay model 
infonnation. The model information is extracted from a transistor level nedist 
usmg a process termed cell charzation. During characterization, a transistor 
level simulation (SPICE) is performed using set of inpat stimuli which model 
signal transitions under various conditions. A power value for each of die set 
of conditions is extracted into a table of raw internal energy data values. The 
raw data is dien compressed into power model values by using a straight 
forwartJ averaging compression scheme. An aspect of mvention pn)vides for 
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a mechanism by which ttiis data can be supplied to a power analysis tool as 
described above. 

One method for constructing the relevant energy tables for each cell 
would be to different input patterns to each cell with different transition tim^ 

5 (transition), and output capacitive loads (capacitance) and compute the average 
energy dissq)ated for a given transition time and output load. In one 
embodiment, the tables are a data stiucture in the memory of a computer 
system. In particular, the energy tables for a particular cell could be 
constructed by a cell characterization system using the following pseudo-code 

10 approach: 



lOOOfor each output 
1001{ 

1002for (capacitance = cap.stait; capacitance <= cap end;>4cap step J 

15 

I004for (transition «= trans_$tan: transition <= trans end; ++irans step) 
1005( " " 

1006for finput = I ; input <= Huusut* +-Hnput) 
I0a7( ^ 
IOCS/* Simulate rise and £all at the output */ 
l<X)9rise_energy a gcLJise.cnergyO; 
1010faU_energy = get_falI_cncrgyO; 
101 lavgjenergy[input] «= (rise.cnergy + falLenersyy2; 
1012) 

1013inax.energy.of Jnputs = max(avg_energy[input]): 
10l4wiite.tableoyq^t(transitibii, capadtance,max.energy.of .inputs); 

1016) 
1017) 



20 



25 Here capacitance is ou^ut load capacitance, transition is iiq>ut pin 

transition time, Ni^^ is the number of cell ir^uts. Rise_energy is the energy 
dissipated during a low to high signal transition, and fall_energy is the energy 
dissipated during a high to low transition. 



wo 95/34036 



PCT/US9S/87040 



-24- 

In this approach, a 2 dimensional table of data values with indexes of 
input transition (transition) and output load (capacitance) is developed. Hiis 
table is supplied to the power analysis tool in the ceU library description to be 
used during power estimation calculation. An example description is provided 
5 below. 



10l8Ubraiy(power2_sample) { ' 

1019 tinie_UBit: "Ins"; /•required for power units calculadonV 

10 voltagc.unit : "IV": /• required for power units calculation*/ 

1021 cttirent_unit : "luA": 

l«2 capacitivejoad.unit (0. 1.ff): /• rcquiied for power V 

1023 pulling_resistance_unit:"lkohin": 

1024 

1025 /• 

1026 Units for internal energy table must be (V»»2)*c 

IS '<»*"««npteIntemalpower = (lv)»»2«.lff= .ifloules 

15 



20 
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1029 The # displayed by Design power in repon^powcr command 

1030 is V»*2 • C • ( l/timc_unii) for this example is . I uW 

1031 V 
1032 

1033/* define unit for leakage power values V 
5 1034 leakage^ower.untt: InW; 

1035 dcfault_ceIlJcalcage_power : 0.2; 
1036 

1037/* define scaling for leakage power values to 
1038compensate for changes in voltage, temperature 
lQ39and process */ 

1040 k_voli_celLlcakage^wer : 0.000000 ; 

1041 k.temp_cclIJcakage_powcr : 0.000000 ; 

1042 k_process.cclljcakage_powcr : 0.000000 ; 
10 1043 

1044 k.voltjntcrnal_powcr : 0.000000 ; 

1045 k_tempjntemal jjowcr : 0.000000 ; 

1046 t-pioccss Jnicmal_powcr : 0.000000 : 
1047 

1048/* Define template for 2 dimensional table , indexes are defined to be the 
1049total ou^t net c^acitance and the input pin transition time. The index 
lOSOvalues by which table values will be determined are listed in the index.l 
J J IQSlandindex^ attributes*/ 

1052 powerJut.tcmplatc(output_byjcap.and_trans) ( 

1053 variable.l : total.output^neccapacitance; 

1054 variable_2 : input^transition^time; 

1055 index.1 rOA 5,0, 20.0^; 

1056 index.2r0.l, LOO. 5.00*); 

1057 J 

1058/* Defme template for 1 dimensional table , index isdeftned to be the 
10S9the input pin transition time. The ind»c values by which table values will 
20 lOSObe determined are listed in the index lattiibute*/ 

1061 

1062 powerJu^template(input.by.trans) ( 

1063 variable.l : input^transidon.time; 

1064 iiidcx_irO.U LOO. 5,001: 

1065 ) 
1066 
1067 
1068 

25 1069/* 2 input combinational logic cell description AND2*/ 

1070cell(AN2) { 

1071 area: 2: 

1072 pin(A){ 

1073 direction : input; 

1074 capacitance:!; 

1075 1 

1076 pin(B) ( 
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1077 direction : inpui; 

1078 capacitance: I* 

1079 ) 

1080 pin(2) ( 

1081 direction : output; 

1082 function : "A B"; 

1083 limingO { 

1084 intrinsic.rise : 0,48; 

1085 intrinsicfall ; 0.77; 

1086 rise^iesistance: 0.1443; 

1087 fall^jcsistance : 0.0523; 

1088 siope^rise : 0,0; 

1089 siope.falliO.O: 

1090 rclatcd_pin ; "A-* 

1091 ) 

1092 timingOC 

1093 intrinsic^rise : 0.48; 

1094 intrinsicfall : 0.77; 

1095 risc_rcsi$tancc: 0,1443; 

1096 falljresisiancc : 0i)523; 

1097 sl0pe.rise : 0.0; 

1098 slopc.fall:0.0; 

1099 relatedjnn : 

noo ) 

UOI } 

1102/^***«»*****.***««„»^.„^^,^^^^__ ^^^^ 

1 103Outpui Power for 2 Ou^t " 

1 1041>cfincs 2d tabic values for intemal 

II J^nr«J?«!^^ « pin Z 

n07 ccUJcakage^wcrrl; ••«•«•••*/ 

11^ intemal^wer(outpui^ly_cap.and.cnms) { 

1109 v«l"csCM.000000. 8.000000. 40.0(X)000"\ 

0 " 2.000000 , 6.000000 . 35.000^T^ 

1 -1.000000.5.000000,30.^^ 
wI2 Felaie(Linputs:''AB'' 

1113 related outputs 

1114 1 
1115) 
1116 

uZSo^'''"' «P-"opsc<,u«tia. Clement V 

1119area:7; 

1120pia(D)( 

1121 direction rinpuq 

1122 capacitance: 1; 

1123 timingO{ 

1 124 timing.typc : setup,rising; 
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1 125 intrinsic.rise : 0.8; 

1 126 incrinsic.fall : 0.8; 

1127 reiaie<l_pin : XF'; 

1128 ) 

1129 timingOl 

1130 timing^type : hold_rising; 

1 13 1 intrinsic.rise : 0.4; 

1132 intrinsic_fall:0.4; 

1133 related j)m : •*CP"; 

1134 ) 

1135 1 

ll36pin(C*){ 

1 137 direction : input; 

1138 capacitance: 1; 

1139 miiv_pul»-wid!h_high: IJ; 

1140 min^pulse.widthjow : U; 

1141 } 

1142 ffaQ.IQN){ 

1143 nextjstate : •*D"; 

1144 clockcd.on:Xr*; 

1145 ) 
1146 

1147 Internal Power for Qock Input: 

1148 describes table for internal power consumed 

1149 during a transition at input pin CP. 

1150 /•*•••••••••*•••••••••••••*..•.•••..••....*«.••«., 

1151 tntemal^wer(input_by„traiu) { 

1152 valucsCt).550000, 0.600000.. 700000"); 

1153 related .input : ••CP^; 

1154 ) 

1155 pin(Q) I 

1156 direction : output; 

1157 function : "IQ" 

1158 tiniingOf 

1159 titningjtype : rising.edge; 

1160 intrinsic^rise : L09; 

1161 intrinsicfall : 137; 

1162 rise.resistance: 0.1458; 

1163 falljresistance : 0.0523; 

1164 related_pin:XP"; 

1165 ) 

1166 ) 

1167/*»*****»— ••***•••••••••*•**•••••••••••••*•/ 

11680utput Power for QN,Q Outputs Defines 2d table values for internal 
1 169power consumed during transition at pin Q,QN 
1170^*»"**»'""**»***»»*»»»*«»»****««»******»****»/ 

1171 cclUeakagc j)ower : 1; 

1172 internal j)ower(ouiput_by jcap.and_trans) { 
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m «?;!^-^^ • * 40.000000 "A 

74 -^2.000000 . 6.000000 . 35.000000 

75 - 1.000000 . 5.000000 , 30,000000 
176 relatedjnputs : '"CP D" 

IT7 related outputs : "O ON"* 

178 ) ■ ^ ' 

179 pin(QN) ( 

180 direction : output; 

181 function : •IQhT' 

182 timingO ( ' 

183 timing_typc : rising^cdgc; 

184 intrinsicjrisc : 1^9; 

185 intrinsicjfail : 1^; 

186 rise_rcsbtance : 0, 1458; 

187 fall..resistance : 0.0523; 

188 related_pin : *XIF*- 

189 ) 

190 ) 
191) 

^* ^ of library powcr.sample •/ 
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2.0 Computing toggle rates 

As previously described, CMOS gates dissipate energy during ou^ut 
transitions one to zero or from zero to one. In order to compute the power 
dissipated by a gate in the circuit the energy dissipated by the gate per 
5 transition is computed and multiplied by the number of transitions per second 
(also referred to as toggle rate) that occurs at the output of the gate. The 
average power dissipated by the design is obtained by siunming up the power 
values for each gate in the circuit. 

One method of computmg toggle rates for the nets in the circuit is by 
10 simulating the circuit with a set of input stimuli and countmg the number of 
transitions at each net, and dividing by the appropriate time unit. This method 
gives accurate values for the toggle rates of nets in the circuit. The simulation- 
based method is slow because, the entire circuit has to be simulated for each 
input vector that is applied. A faster but potentially less accurate method is the 
15 probabilistic method. As described previously, static probability at a pomt in 
a net is an estimate of the total fraction of tune that tiie node spends at the 
logic value of one. This method takes static probability values and toggle rates 
for every primary input and estimates the toggle rates at the internal nodes and 
outputs from the values at the primary vaputs. The probabilistic method can 
20 be several orders of magnimdes faster than the sunulation-based mechanism 
because there are no vectors required. This method is very advantageous in 
situations where a quick estimate of die average power dissipation is desired. 
This situation typically arises in a high-level design enviroiunent where for 
example, designers will inake tradeoffs be between different implementations 
25 for modules. In this situation, it is not necessary to get a highly accurate 
power value because it is very early in the design cycle. However it is 
important to produce the estimate quicHy. 

The next section explains how to compute die toggle rates for a circuit 
containing only combinational logic. The section following that describes how 
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to use the combinational logic method as pan of the process of computing 
toggle rates for circuits for containing both sequential and combinational 
elements. 

2.1 Computing toggle rates for a Combinational Logic Circuit 

In order to compute the toggle rates in a combinational circuit, 
probabilities and toggle rates are first annotated on die primary inputs. After 
that is completed, die logic fimction is computed at each net in the circuit with 
respect to the primary ii^uts in the transitive famn of the net. For each 
fimction, boolean difference fianctions and tiieir probabilities are computed 
widi respect to each input. The toggle rate for the fimction (and hence the 
associated net) is calculated usmg fliese values and die toggle rates of the 
primary mputs (which arc aheady given). Transition Density, A Stochastic 
Measure of Activity in Digital Circuits, by Farid N. Najm, paper 38.1 in the 
28th ACM/IEEE Design Automation Conference. 1991. explams a basic 
process for computing what arc rcferrcd to here as toggle rates, and is hereby 
incorporated by reference. Estimation of Average Switching Activity in 
Combinational and Sequential Circuits, Abhijit Ghosh, Srinivas Devadas. 
Kurt Keutzer and Jacob White, in the 29th ACM/IEEE Design Automation 
Conference in 1992 provides another process for consulting what are referred 
to here as toggte rates, and is hereby incoiporated by reference. 

As described in the literature, computing toggle rates in ciicuits 
requires computing various boolean fimctions. Computing these fimctions 
requires data stnicmres and algoritiuns. One efficient metiiod of processing 
boolean fimctions involves Binary Decision Diagrams (BDDs). Efficient 
Implementation of a BDD Package, by Karl S. Brace, Richard L. Rudell, 
and Randal E. Bryant in paper 3.1 of the 27th ACM/IEEE Design 
Automation Conference, 1990. describes how to implement and use BDDs. 
and is hereby incorporated by reference. Ugic Verification using Binary 
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Dedsion Diagrams in a Logic Synthesis Environment, by Sharad Malik» 
Albert R. Wang/Robert K. Brayton, and Alberto Sangiovanhi-Vincentelli, 
Proceedings of ICCAD, 1988, describes methods of efficiently building BDDs 
for large circuits and is hereby incorporated by reference. Software for 

5 manipulating BDDs can be obtained from SIS-BDD package available 
electronically using the FTP command from ic.berkeley.edu. 

Figure 4 provides a flow chart for a method of computing the toggle 
rates of a combinational logic function assuming zero delay on the gates. The 
choice of delay model affects the accuracy of the power computation. A more 

10 accurate the delay model provides a more accurate power estimate. However, 
zero delay power estimates are computationally cheaper to compute than imit 
delay or general delay models. In a preferred embodiment, zero delay models 
axe used. 

The process begins at step 4000 by ordermg the primary outputs based 
IS on their depth (in terms of levels of logic) from the prinmy iqputs of the 
network. The primary ouq)uts with smaller depth are placed before primary 
outputs with greater depth. The intuition here is that: more the number of 
levels of logic for a primary output, larger is the BDD requued to represent 
that ouq)ut. The ordering of the primary ii^>uts is derived the primary output 
20 ordering by placing "deeper" variables ahead of "shallow" variables. This 
approach to variable ordering is similar to the one described in Logic 
Verification using Binary Decision Diagrams in a Logic Synthesis 
Environment, described earlier. It also describes a frameworic for building 
BDDs in large networks and it addresses some memory issues. In addition,, 
25 . -^Dynamic Variable ordering for OBDDs, by Richard Rudell m Proceedings 
of ICCAD, 1993 describes other methods for doing this, and is hereby 
incorporated by reference. 

The process contmues to step 4001 by specifying the toggle rate on 
each primary input net as well as the static probability for each primary ir^ut. 
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The static probability for each input is the probability that the input is equal 
to a logical one. A BDD variable is created for each primary input. TTie 
ordering of the inputs (and hence the BDD variables) is determined by step 
4000. The output nets are pushed onto a stack such that the "shallowest" 
output (from step 4000) is at the top of the stack. Each net in the circuit that 
is not a primary input is marked unprocessed. Each net in die circuit is given 
an integer value that is set to the number of fanouts of that net (fan-out count). 

If the stack is empty, then the process terminates with the toggle rates 
computed, as shown in step 4010. 

At step 4020, it is determined if the net at the top of the stack is ready 
to have its static probability and toggle rate computed. A net is ready if aU of 
the inputs to the gate driving it have been processed, or the net is a primary 
ii^ut. 

If the top of the stack is not ready, push aU nets that are unprocessed 
inputs to the gate driving the net at the top of the stack onto the stack as 
shown in sxep 4030. and return to step 4020. 

If the top of the stack is ready, then compute tiie boolean function of 
the net at the top of the stack from its inputs using the BDD package as shown 
m step 4040. In addition, compute the boolean diffeience functions for the 
each inputs as required by Transition Density. A Stochastic Measure of 
Activity in Digital Circuits, which is described above. 

Step 4050 tests to see if the BDD package had enough memory to 
complete tiie computations in step 4040. As wiU be explained later, during 
the course of processing, it may be necessaiy for a particular internal net to 
be treated as though it were an indenpendent input. Such a net is referred to 
as a pseudo-primary iiq>ut. 

If there was enough memoiy. then the toggle rate and the static 
probabUity of net /, on die top of tiie stack can be computed as indicated by 
step 4070. Compute die static probabUity of tiie net by computing die 



wo 95/34036 



PCT/US95/07040 



probability that the boolean function, / is one. Compute the toggle rate, Ttj 
of the net i using 

5 

where ® denotes the "exclusive or" operator, Pf denotes the probability 
operator, Xj denotes the j-th primary or pseudo-primary input. The operation 
described above to compute the toggle rate for the net i is an expensive one. 
This is because we have to build as many boolean differences (represented as 

iO BDD formulas) as there are primary variables. In addition, for each boolean 
difference BDD we have to compute the static probability in order to obtain 
the coefficient of the corresponding primary iiq)ut's toggle rate. In order to 
address this CPU bottleneck, the idea of "pooling" was used. The static 
probabilities of the boolean differences HDDs are evaluated simultaneously 

IS instead of individually. This is because the static probability computation 
results in several mtermediate results (i.e. HDDs for smaller formulas) gettmg 
computed for free. So if we group all the boolean differences and then 
compute the static probabiliQr, there is greater likelihood of sharing 
intermediate formulas (and hence results) across tt^ different static probabiliQr 

20 computations. This method will ensure that the same sub-formula is never 
evaluated twice. The 'Spooling" mechanism helps to r^ce considerably the 
run time of the probabilistic analysis, and thus permits an increase in the size 
of the circuit that can be evaluated. 

Note that evaluated net / is driven by a particular gate. For eveiy input 

25 net to this gate, ensure that it is not a primary input or pseudo-primary iiq)ut 
and decrement the fem-out count on that net. If the fan-out count on an iiq>ut 
to the gate corresponding to this net reaches 0, release the HDD associated 
with the function on that input net t)ecause it will not be needed any more i.e 
all the gates that needed that net*s formula have already used them. A crucial 
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advantage of this technique is the efficient usage of BDD fomulas in the 
circuit. No BDD fonnula remains allocated unless it is required for a later 
computation. This helps conserve memory which in turn lowers the peak 
memoiy usage of this software. Pop the stack and return to the decision of 
5 step 4010. 

2.2 Memory recovery techniques 

One of the most important characteristics of this probabUistic 
estimation method is its speed. Usually BDD based approaches suffer from 
10 capacity as well as run time problems i.e they do not work for large circuits 
and work slowly for relatively large circuits. The advantage of the technique 
presented in Fig. 4 is its efficiency in dealing with aU circuits, regardless of 
size. An important factor of the speed of this method vis-a-vis other methods 
is the efficient algorithm that is used to reclaim memoiy during the BDD 
15 manipulation steps. 

Step 4040 in Fig 4 contains two operations where Uierc may not be 
enough memoiy to perfonn the computations required. These are the BDD 
buUding step for a net and the toggle rate computation step. Since BDD 
buflding operations can lead to dramatfc increase in the number of BDD 
nodes, we pUce a memory capacity on the BDD package. Placing an upper 
bound on the number of nodes in the BDD automatically restricts the amount 
of memory the BDD package can aUocate and hence controls the behavior of 
the BDD package when laige BDDs are bemg processed. Since a cap is 
placed, it is also important to come up -with a strata to deal widi the 
25 memoiy overflow problem. This is also referred to as the "blowup" problem. 

The blowup strategy that is used has three impoitam properties. First, 
it only frees those formulas from which large chunks of memory can be 
recovered. In addition, it also tries to miiumize the number of BDDs freed. 
Finally, it should account for a smaU fraction of the overall runtime of the 
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power analysis. Whenever a BDD at an intermediate net is freed, that point 
in the circuit is treated as a pseudo-primary input. The static probability and 
the toggle rate is computed for that node and the new node is assumed to be 
an independent primary input i.e it is not correlated to any of the otiier 
primary ir5)uts that exist in the circuit. This assumption is a source of 
inaccuracy, because two inputs to a gated ownstream from the newly created 
primary input, may be treated as unrelated when in reality, they share some 
common primary inputs. Due to the accuracy implications of creating pseudo- 
primary iiq)uts, the blowup strategy used tries to minimize the number of 
BDDs that have to be freed. Since re-claiming memory is another important 
goal, it is important for the blowup strategy to be effective in recovering 
memory. The blowup operations appear in Step 4060 of Fig 4. 

In the previous section, a method was described to conserve memory 
by storing only those BDDs that are needed for future evaluation. These 
BDDs correspond to those gates in the circuit which are cormected to iipits 
of unprocessed gates in the network. This set of gates which have BDDs is 
referred to as the "frontier." Each of the gates in flie frontier also has the 
property that their fan-out counts are non-zero. The frontier is a dynamically 
changing set of gates that keeps getting updated every time a gate is processed 
in Step 4070. In tiie blowiq) strategy, the first step is to identify the set of 
candidate BDDs that can be freed. This is directiy obtained by examining the 
frontier. In order to speedup the blowup step, the frontier is maintaii^ 
dynamically by addnoving gates from it as every gate in the circuit is being 
traversed. 

Compute the size of each BDD m the frontier. These BDDs are then 
sorted in decreasing order of size. Starting with largest BDD and its associated 
net. free that BDD and create a new BDD variable associated widi that net. 
This variables are pseudo-primary iiqnits. Define the static probability and 
toggle rate of the new variable as the static probability and toggle rate of that 
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net (already computed in Step 4040). Continue replacing BDDs with variables 
until the memory used by the active BDDs (number of non-variable nodes in 
the HDD) reaches a predetermined level. In one embodiment, BDDs are freed 
unta the memory used is less than 50% of the memoiy available to the BDD 
package. When the predetermined level is reached, the blowup strategy 
terminates and the normal traversal of the circuit resumes to compute and 
evaluate BDDs at the unprocessed gates in the cuxaiit. In practice, this 
strategy is known to work very well for several large circuits. The average 
percentage of formulas freed by one embodiment of the strategy is 8% (2 or 
3 BDDs) and the runtime impact is about 1% of the overall nintime of the 
power analysis. 

2.3 Accuracy improvements for combinational logic circuits 

In the method of Figure A, step 4060 showed one way to recover 
memoiy. This method is veiy fast but there is a loss of accuracy resulting 
from this step. If more accuracy i:; desheable, an alternate mediod can be 
used to compute the static probabilities and toggle rates without possftly 
having to create pseudo primary inputs. This method involves re-tiying faUed 
ouq)uts (i.e outputs of the circuit for which BDDs could not be built) and 
tiying a different variable ordering for tfieir iigjuts. 

After determiniAg at step 4050 that there is insufficient memory, one 
could abort the processmg for that output (instead of firing off the blowup 
strategy) and add that primary ouQ)ut to a list of failed ou^. This way, at 
the end of one pass of the algorithm shown in Figure 4. some of the primary 
outputs would have been successfully processed (without memory blowup) and 
there might be some which could not be processed due to the given variable 
ordering. Prune the circuit to remove the successful ou^uts and run another 
pass (step 4000) of the algoritimi on the pmned circuit. This may result in a 
different input order bemg derived for the primary inputs. As a result some 
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of the outputs that failed with the earlier order could succeed with the new 
order. Continue to iterate until all outputs have been evaluated or there are 
a set of ouq)uts for which an input order could not be derived. 

For the unresolved set of outputs (usually a small subset of the original 

5 set of outputs), we could go back to Step 4000 with the blowup strategy 
enabled in Step 4050, to estimate toggle rate and static probabilities for these 
outputs. This would impact the accuracy of the estimates, but not as much as 
if all the primary outputs were processed using the blowup strategy. 
Alternately, to be even more accurate, each primary output in the failed list, 

10 could be taken in turn and processed using the method in Figure 4. 

2.3*1. Additional Examples of Combinational Logic Analysis 
Referrii^ to Figure A there is shown an illustrative schematic diagram 
of an exeniplaTy electronic circuit. The circuit has multiple primary iiqmts II- 

15 19 and has multiple primary outputs POq, PO, and, POj and has mult5)le gates 
Nl-NlO. Each gate is represented by a netlist node stored in memoiy. Each 
wire connection between gates is represented by a net stored in memory. 
Each primary 'mpui and each primary output also is represented by a netlist 
node. He Figure A diagram also serves to illustrate a netlist stored in 

20 electronic memory that represents the circuit. « 

A presently preferred technique for estunating average power 
consumption by the exemplary electronic circuit of Figure A in accordance 
with the invention involves first ranking the primary ouq3Uts, in an order 
which depends upon the maximum number of logic levels between each 

25 primary ou^ut.and the primary iiqmts that feed such primary ou^ut. For 
example, the noaximum number of combinational logic levels below prirruiry 
output POo and the primary inpats that feed into primary ou^ut POo is one. 
The only logic gate that feeds FOq is Nl. The maximum number of logic 
levels that feed POj is four. Primary iupuxs I3 and I5 feed into POi through 
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gates N2, N8, N9 and NIO. The maximum number of logic levels that feed 
POi is three. Primaiy inputs 17 and 18 feed into POj through N3. N4 and N5. 

The primary outputs are ranked in increasing order of maximum logic 
level depth. That is, the primary ou^ut widi the lowest maximum number of 
logic levels between it and a primaiy input is ordered first, and the primaiy 
ou^ut with the highest maximum number of logic levels between it and a 
primary ii^ut is ranked last. Referring to the illustrative drawings of Figure 
B, the set of primaiy outputs from the electronic circuit are shown ranked in 
order from lowest maximum logic level depth to highest maximum logic level 
depth: POp, followed by POj, followed by PO,. 

Referring to the illustrative drawings of Figure C, die diagram of the 
exemplary netlist with nets annotated in accordance with fanout numbers. The 
neflist that represents the circuit wiU be described with reference to Figure A 
since netlist nodes represent circuit gates and netlist nets represent circuit 
wires. The fanout count of a given net equals the number of gates that 
receive an input from that net. For example, net 2000 has a fanout count of 
1, since it only feeds a netlist node representing a single gate Nl. The fanout 
count of net 2002 is 2. since it feeds two netlist nodes representing gates Nl 
and N6. The fanout count of net 2004 also is two since it feeds two netiist 
nodes representing gates V% and gate NIO. Net 2006 has a fanout count of 
2 since it feeds two netlist nodes representmg gate N6 and gate N7. Nets 
2008 and 2010 each have fanout counts of one since they each only feed the 
neaist node representing gate N3. TTie fanout counts annotated on the 
remainmg nets will be appreciated from the previous discussion. Thus, it will 
be understood that a fenout count is stored for every net in the netlist stored 
in the electronic memory. 

A depth-first travenal is a technique to "process" all the netlist nodes 
in the electronic memory such that nodes at prior or deeper levels of logic are 
processed before nodes at subsequent or shallower levels. Netlist nodes at 
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subsequent or shallower levels of a netlist often are referred to as parent nodes 
of netlist nodes that feed into them from the next prior or deeper logic level. 
In a depth-first traversal, all child nodes of a parent node are "processed" 
before that parent node is processed. In the current embodiment that means 

5 that BDDs for child nodes are constructed and have switching activity values 
conq)uted for them before a BDD is constructed for the parent node. 

A significant reason for ranking is to enable construction of BDDs 
using fewer bytes of electronic memory. The ranking affects the size in 
memory of BDDs. Larger BDDs mcrease running tune of the software power 

10 estimadon tool due to excessive paging of memory. 

A depth-first traversal begins with POo based on the ranking illustrated 
in Figure B. The stored netlist representing the exemplary electronic circuit 
proceeds by first constmctmg a BDD for the netlist node representing the 
deepest logic level gate that feeds POq. Since the only gate that feeds PQq is 

IS Nl« a BDD is constructed for the netlist node that represents gate Nl. 
. Referring to the illustrative drawings of Figure D, there is shown an 
illustrative BDD for the netlist node that represents gate Nl. BDD (Nl) is 
substituted into the netlist in place of the netlist node representing Nl . Values 
then are calculated for static probability (SP) and toggle rate (TR) for the 

20 constructed BDD (Nl). Next, as illustrated in Figure £, the fanout counts of 
the two nets that feed BDD (Nl) each are decremented by one to indicate that 
one of the netlist nodes fed by each of the fanouts has been processed. The 
fanout counts that annotate the stored nets are used to monitor the processing 
of netlist nodes fed by the n^. 

25 After the depth-first traversal for PO© has been completed, a depth-first 

traversal begins for the next ranked primary ou^ut POj. POj has the next 
highest maximum logic level depth. Referring to the illustrative drawings of 
Figure F, there is shown a portion of the combinational logic that feeds into 
POj. Specifically, diere is shown gate N3 which is at the deepest logic level 
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feeding PO^. N3 is three logic levels below PO,. Gate N4 is two logic levels 
below N5. Gate N5 is one logic level below PC,. As pan of the depth-first 
traversal of POj, BDD (N3) is substituted into the netlist for the netlist node 
that represents N3. SP and TR values are computed for BDD (N3). The 
fanout counts on nets 2008 and 2010 are decremented by one so that each now 
equals 0. 

Next, as indicated in Figure G, BDD (N4) is composed from BDD 
(N3) and BDD 09). BDD (N4) is composed m accordance with a logical OR 
fimaion consistent with the functionality of gate N4. Figure H illustrates the 
substitution of BDD(N4) into the netiist for the neUist node representing gate 
N4. SP and TR values are computed for BDD (N4). The fanout counts of 
nets 2012 and 2014 each arc decremented by one to mdicate that one BDD fed 
by each of these two nets has been constructed, andhashadanSPandaTR 
value computed for it. Since the fanout count of net 2012 is 0, BDD (N3) is 
released firom the electronic memory. Likewise, a BDD(D) rcpiesentu^g 
primary input 19 can be released from memory. The release of BDD(N3) and 
BDD (19) fiiees memoiy for other uses such as construction of further BDDs 
to replace further netlist nodes. 

Referring to the illustrative drawing of Figure I, there is shown a 
20 fragment of the electronic circuit which feeds PO,, A depth-first traversal of 
the combinational logic that feeds PO, is performed last, since PO, has the 
largest maximum logic levels depth. In Figure I, it is presumed that dq)th- 
first traversal has progressed to the point that: BDD (N6) has been substitoted 
for the netlist node that represents gate N6; BDD (NT) has been substitoted 
25 for the netiist node ttiat represents gate N7; and BDD (N8) has been 
substitoted for the netiist node tiiat represents gate N8. It is also presumed 
tiiat during tfie constraction of BDD (N9), there is an overflow of memory 
beyond flie defined threshold value. That is, the amount of memory occupied 
by BDDs has "blown up" beyond a user defined tiu«shold. It is fiirtiier 



15 



wo 95/34036 



PCTAJS95/07040 



-41- 

presumed in this example that the frontier includes BDD (N6), BDD (N7), 
BDD(N8). 

In accordance with the techniques of the present invention, a BDD in 
the frontier that feeds the netlist node representing gate N9 and which 
5 occupies the largest amount of memory is released first. A determination is 
made as to whether the memory freed through the release of that particular 
BDD is sufFicieitt to bring the memory usage below the threshold. If it is not, 
then the BDD which occupies the next greatest amount of memory and that 
feeds the netlist node that represents the N9 is released from memory. A 
10 further determination is made as to whether the release of this additional BDD 
frees enough memory to bring BDD memory usage below the threshold. 

It will be presumed that BDD (N6) and BDD (N7) occupy more 
memory than BDD (N8), and that both were released before BDD memory 
usage fell below the defined limit. Referring to the illustrative drawings of 
15 Figure J, there is shown an exemplary drawing of the structure stored in the 
electronic memory after the removal of BDD (N6) and the removal of BDD 
(NT). BDD (N6) is replaced with pseudo primary input HOP, and BDD (N7) 
is replaced with pseudo primary irq)ut 111. 

The substitution of pseudo primary ir^uts reduces accuracy of the 
20 power estimation because any correlation that may have existed between this 
node and any other netlist node is now ignored for consequent analysis. 

The technique for setting the memory threshold involves computmg a 
value which is a percent (%) of the maxunum allowed memory for the BDD 
construction. This number is computed enipirically using rigorous 
25 experimentation. The goal is to release sufficient electronic mexnory to allow 
the consequent analysis to complete without rurming out of memory too often. 
For example, if inaxunum capacity set for a given cucuit is 100 bytes then 
tiireshold may be 30% i.e. 30% x 100 = 30 bytes. 
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2.4 Computing Toggle-rates in a Sequential Circuit: Overview 
2.41 Constructing a State Element Graph (SEG) 
Figure 5 shows the process for evaluating the toggle rates for a circuit 
that includes sequential elements. The process begins with step 5010 by 
obtaining a state element graph (SEG) for the circuit which represents the 
sequential elements as nodes and combinational logic connections between the 
sequential elements as directed arcs connecting the nodes. Figure 6 shows an 
electronic circuit consisting of combinational gates (AND's, OR's and 
exclusive OR's). sequential gates (D flip-flops), and iiqjut/ouiput ports. 
Figure 7 shows the SEG derived from the circuit shown in figure 6. Initially 
there is a node for each sequential element. For example, flip-flops nl 
through n9 in figure 6 become nodes si through s9 in figure 7. A directed 
arc connects sequential element / to sequential element J if there is a 
combmational logic pafli from tiie output of sequential element / to an input 
of sequential element y. For example, tiiere is an arc between nodes si and 
s5 in figure 7 because of a combmational patii (through an exclusive-OR) fiom 
flip-flops nl to n5 in figure 6. The design's primary input ports and primary 
output ports are also represented as nodes in the SEG widi appropriate arcs 
to nodes that correspond to sequential elements that are connected to die ports 
through combinational logic. For example, nodes s_inl to s_m4 in figure 7 
correspond to mput ports inl to in4 in figure 6. Sinulariy, nodes s_ol to s_o3 
in figure 7 coiiespond to output ports ol to o3 in figure 6. 

The state element graph formed in step 5010 can contam cycles (also 
referred to as loops). A cycle exists if a patii exists from a node back to itself 
traversmg one or more directed arcs. Cycles can be self-loops where a node 
has an arc tiiat originates and termmates at itself, or a cycle can consist of 
multiple cells. For example, tiiere are two loops in the SEG shown in 
figure 7. one loop tiiat goes tiirough nodes si and s5. and one self-loop around 
node s6. Whether tiiey are self-loops or multiple cells loops, cycles must be 
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treated specially as the objective of the SEG is to transforai the circuit into an 
acyclic representation of the circuit to enable serial processing of the design. 

2.4.2 Flagging self-loops in the SEG 

Before any changes are performed on the SEG, all self-loop nodes are 
5 flagged m step 5015. For example, the loop around node s6 is flagged during 
step 5015. By being able to distmguish nodes that have self-loops, the 
sequential propagation (step 5120) can be streamlined for the common case of 
non-self-loop nodes. This will be discussed further m Section 2.4.5. 

2.4.3 Breaking cycles in the SEG 

10 In step 5020, every cycle in the graph is broken by choosing one node 

of the cycle. When appropriate, the same node can be chosen to break 
multiple loops if the same node is contained in multiple loops. For example 
in figure 7^ the loop through nodes si and s5 is broken by choosing eith^ 
nodes si or s5. The self-loop aroimd node s6 has to be broken by choosing 

15 node s6. After the selected nodes are chosen, each selected node is replaced 
with two nodes called the loop source node and the loop sink node. The arcs 
that terminated at the selected node are instead routed to that selected loop 
node's loop sink node. The arcs that originated from the selected node are 
instead connected to the corresponding loop source node. For example, 

20 figure 8 shows a SEG graph after loops have been broken. Nodes as2 to as5 
correspond to nodes s2 to s5 in figure 7. Nodes asl_s and aslj in figure 8 
represent the source and loop versions of node si in figure 7. Similarly, 
nodes as6_s and as6 J in figure 8 correspond to the source aiid loop versions 
of node s6 in figure 7. After leplacmg all selected nodes, the state element 

25 graph will become acyclic. 

One method for identifying the selected nodes to break the SEG is as 
follows. First work wifli a copy of the SEG. Determine which nodes to 
select to break the SEG in the steps below using the copy. 
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Second, identify any node that has an arc that comes back to itself. 
Mark every such self-loop node as a selected node, delete it from the copy of 
the SEG, and delete all arcs origmating from it and entering into it. For 
example, node s6 in figure 7 is marked as a self-loop and then deleted. This 
will reduce the size of the SEG copy. As mentioned previously, the objective 
of this stage is to transform a cyclic SEG to an acyclic one. As nodes in the 
SEG are processed and deleted from the SEG, the size of the SEG wiU 
become reduced until the eventual stage when no nodes are left in the SEG 
copy. At that point, every cycle in the SEG has been broken. 

Third, find every node that has no arcs entering it. Delete that node, 
and delete aU arcs leaving such nodes. Again, this results in a smaller SEG. 
Repeat the third step on the compacted SEG until eveiy node has arriving 
arcs. 

Fourth, find eveiy node that has no arcs origmating from it. Delete 
that node, and delete all arcs entering such nodes. This too lesults in a 
smaller SEG. Repeat the fourth step on tiie compacted SEG until every node 
has departing arcs. 

Fifth, identify every node tiiat has exacUy one arc entering and one arc 
leaving (and tiiat isn't a self-loop node). Delete that node and the departing 
arc. Reroute tiie arriving arc to the node where tiiedq)arting arc went. Note 
tiiat tiie new destination maybe tiie node where tiie arriving arc originated. 

Repeat tiiis step on tiie SEG until fliere are no nodes with exactiy one arc 
entering and one arc leavii^. 

Sixth, if any node at tiiis point has a self-loop, mark it as a selected 
25 node, and delete it as was done in flie second step, and return to flie fliiid 
step. 

Sevenfli, if the sixtii step did not result in tiie deletion of at least one 
node, identify the node tiiat has die largest sum of tiie number of arcs entering 
and exiting. Mark that node as a selected node and delete it from tiie graph. 
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Also delete any arcs entering or leaving the deleted node. Return to the third 
step. Repeat the third through seventh step until there are no nodes left in the 
SEG copy. The selected nodes are the ones to break the original SEG mto a 
directed acyclic graph (DAG). 

5 Another method for breaking cycles in a graph is given in Introduction 

to Algorithms by Thomas H. Gormen, Charles E. Leiserson and Ronald L. 
Rivest on pages 477-483. The book was published in 1993, has ISBN 
0-262-03141-8, and is hereby incorporated by reference. 
2.4.4 Processing the SEG 

10 Step 5020 produces a modified state element graph (MSEG) from the 

SEG. Because the cycles are broken, the MSEG is a directed acyclic graph. 
The MSEG is used as an acyclic representation of the circuh to allow serial 
propagation of static probabilities and toggle rates from the MSEG's primary 
irq)uts to the MSEG*s primary ou^uts. The MSEG's primary inputs consist 

IS of the design's primary input ports as well as ouQ)uts of sequential elements 
that were selected to break cycles in the original SEG. The MSEG's primary 
outputs consist of the design's primary ou^ut ports as well as ii^>uts of 
sequential elements that were selected to break cycles m the original SEG. 
The serial processing of the MSEG can be performed in two ways 

20 which tradeoff conq)lexity versus efficiency. The first approach will be 
referred to as "Uniform MSEG Processing" because it always propagates 
static probabilities and toggle rates for every cell regardless of the step in the 
MSEG processing that is being performed- The second approadi, referred to 
as "Modal MSEG Processing", is more efficient than the first approadi, but 

25 it involves distinguishing the mode of the propagation based on ihe step in the 
MSEG pn)cessing that is being performed. 

The Uniform MSEG processing strategy will be described first 
followed by a description of how the Modal MSEG processing strategy differs 
from the Uniform processing strategy. 
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2.4.4.1 Uniform MSEG Processing 
Step 5020 produces a modified state element graph (MSEG) 
ftom the SEG. Because the cycles are broken, the MSEG is a directed acyclic 
graph. In step 5030, the nodes of the MSEG are labeled witii their 
appropriate level numbers. To do this, label each node having no inputs with 
0. The level number of a node is 0 if it has no predecessor, or its level 
number is one more than the maximum level number of any of that node's 
immediate predecessors. 

In step 5040, assign static probabilities and toggle rates to the mputs 
of the combinational logic circuits corresponding to the arcs in the MSEG that 
originate firom any level 0 node or from any primaiy inpm. The static 
probabUities and toggle rates could be user specified, they could be estimated 
from simulation, or they could be chosen arbitrarily. Define a level counter 
and set it equal to one. 

In step 5050, compute the toggle rates and the static probabUities of the 
internal nets of the combinational logic that tenmnates at a node whose level 
number equals the current level counter value using the methods described in 
Figure 4. 

In step 5060, compute the toggle rates and the static probabUities of the 
ouqmts of fbe sequential elements at nodes with level equal to the level 
counter value. As described in Section 2.4.5, the toggle rates and static 
probabilities of the outputs of sequential elements are computed as a fimction 
of tiie toggle rates and static probabilities of the sequential element's inputs. 
After tiiis, increment tiie level counter and repeat steps 5050 and 5060 
25 until all of the levels have been processed. 

At this point, tiie stotic probabilities and toggle rates have been 
computed for every net in tfie design. This method will produce static 
probabUities and toggle rates at the output of tiie loop sink nodes. Recall that 
each node m die state element graph that was selected to break cycles in the 



15 



20 
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state element graph has a loop source node and a loop sink node. These 
nodes correspond to the same sequential element, and therefore the output of 
a loop sink node and the ouq>ut of the loop source node correspond to the 
same physical point in the circuit (an output of a sequential cell), and they 

5 should have the same static probabilities. However, because the initial 
probabilities for the selected node loop source nodes (assigned in step 5040) 
may have been arbitrary estimates or defaults, a single pass through the design 
(steps 5050-5060) will generally not yield convergence of static probabilities 
valuesTor the selected nodes. Therefore, iterate on the entire design until the 

10 static probabilities of the loop sink nodes and loop source nodes converge. 

Step 5070 reduces the MSEG to eliminate nodes that can not be 
affected by iteration. Step 5070 constructs the reduced modified state element 
graph (RMSEG). As was described in step 5020, the selected node set 
contains all nodes that were chosen to break cycles in the SEG. In the 

15 MSEG, every selected node actually consists of two nodes (a source node and 
a loop node) that correspond to a single sequential element. Construction of 
the RMSEG starts by determining the nodes that can be reached from the 
selected node source nodes. A particular node is reached if there is a path in 
the MSEG from a selected node source node to that particular node. All 

.20 unreached nodes can be deleted. In addition, the nodes that are not part of 
any path leading to a loop sink node can be temporarily deleted until the 
iteration is complete. The RMSEG should be relevelized usmg the method of 
step 5030. Also, set an iteration count equal to zero as shown in step 5085. 
Step 5080 determines whether the static probabilities at the output of 

25 the selected node sink nodes are sufficiendy close to the static probabilities at . 
the ou^ut of the selected node source nodes. If the smaller value is within 
a certain tolerance (e.g. 1%) of the larger value, then the sequential cell's 
values are assumed to have converged. If the static-probabilities have 
converged for all of the selected nodes in the MSEG, or if the number of 
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iterations (through the loop comprising step 5080, step 5090, step 5105, step 
5110, step 5120, and step 5130) has exceeded a predefined threshold number, 
then the iteration is terminated. 

If the result of step 5080 tenninates the iteration, then the toggle rates 
and static probabilities need to be propagated through the nodes that were 
temporarily deleted (in step 5070) because they were not part of a path leading 
to a loop sink node. Step 5094 percolates the toggle rates and static 
probabilities through the remainder of the circuit as was done with the MSEG 
in stq) 5050 and 5060. 

If, on the other hand, the result of step 5080 indicates that the static 
probabilities of the selected sequential cells have not converged to steady-state 
values, then the iteration must continue. In that case, step 5105 transfers the 
static probabilities and toggle rates from the loop sink node output to the 
corresponding loop source node ou^t. The RMSEG is processed m step 
5110 and step 5120 as the MSEG was processed in step 5050 and step 5060. 

After completing one iteration of the RMSEG, the iteration-count is 
incremented in step 5130 and the convergence criteria is r«-evaluated in step 
5080. 

2.4.4.2 Extensions for Modal MSEG Processing 
During Modal MSEG processnig, every net that feeds into an 
MSEG node (e.g. sequential input nets) can be in one of two modes: 

1) "sp-only" mode: Under this mode, MSEG processing only 
propagates static probabilities (not toggle rates) for an endpoint 
net and all of die nets in the transitive fanin of its 
combmational paths. 

2) "sp-and-tr* mode: Under this mode, MSEG processing 
prq)agates both static probabilities and toggle rates for an 
endpoint net and all of the nets in the transitive fanin of its 
combinational paths. 
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These modes modify the behavior of steps 5040, 5050-5060, 5110- 
5120, and 5094. Otherwise, the MSEC processing steps described in 
Section 2.4.5.3 are unaffected. 

When Modal processing is enabled, step 5040 must also determine the 

5 mode of every endpoint net for each level. These endpoint nets represent 
iiq)uts of sequential cells or primary output ports of the design. The mode of 
each such output net defaults to "sp-only". However, if the net is being used 
to drive any asynchronous logic (e.g. asynchronous preset, latch enable, or 
flip-flop clock), the net's mode is set to "sp-and-tr". The mode of a net 

10 applies to that net and all of its transitive fanin nets in combinational logic 
paths that feed that endpoint. Distinguishing the nets in this manner is valid 
because the toggle rates of synchronous sequential inputs doesn't affect the 
toggle rate of the sequential cell's output(s). Tterefore, it is unnecessaiy to 
spend the time to compute those toggle rates during the MSEG processing, 

15 and, consequently, significant processing time can be saved. For exanq>le, for 
a standard D flip-flop with two mputs, D and CLK, and one ouqmt, Q, (bt 
toggle rate of Q is a function of the static-probability of D and the toggle rate 
of CLK, but not the toggle rate of D. The fonnuUtion for propagating static 
probabilities and toggle rates across sequential elements is described further 

20 in Section 2.4.5. 

Depending on the mode of an endpoint net, it will be handled 
differently by steps 5050-5060 and steps 5110-5120. If an endpomt was 
marked as an "sp-only" net, then the combinational propagation strategy only 
computes the stadc-probability of that net. This enables significant nm-time 

25 improvements since toggle rate values don't need to be computed for that 
endpoint nor for any of the nets in the transitive fanin of the combinational 
path that feeds that endpoint. If, however, the endpoint is marked as an "sp- 
and-tr" net, the net will be processed nonnally as described m Section 2.4.4.1 
(steps 5050-5060 and steps 5110-5120). 
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When Modal processing is enabled, step 5094 is extended to operate 
on all nets in the design. Nonnally, after step 5090 terminates iteration on the 
MSEG, step 5094 only operates on nets that are in the transitive fanout of the 
selected node sink nodes to ensure that aU nets in the design are annotated 
with valid static probabiUty and toggle rate values. If Modal processing is 
enabled, step 5094 instead operates on all nets in the design. This ensures 
that toggle rates are computed for any nets which may have only had their 
static probabflities computed during the Model MSEG processing. 



2.4.4.3 Additional Examples of SEG and MSEG Processing 
Referring to the Figures Ja and Jb, there are shown simplified 
iUustrative drawings of a sequential element graph (SEG) and a corresponding 
modified sequential element graph (MSEG). Referring to Figure Ja, the SEG 
includes a node n which receives an input from node A and provides an output 
tonodeB. The node n also has a directed self-loop arc 3000. The MSEG is 
presumed to correspond to a netlist which is not shown. In particular, the 
node n corresponds to a sequential element. The self-loop arc 3000 
corresponds to a group of neUist nodes that represent a group of combinational 
logic that propagates both to and from the sequential element to which node 
n corresponds. The directed arc 3002 directed ftom node A to node n also 
cotreq)onds to a group of netlist nodes that represent a group of combinational 
logic that propagates signals from input A to the node. The directed arc 3004 
between a node n and node B corresponds to yet anotiier group of netlist 
nodes that represent another group of combinational logic that propagates 
25 signals from node n to node B. 

In Figure Jb, the node n has been split into two nodes, n_s and n 1. 
New pseudo primary input node 3006 has been created, and a new pseudo 
primary output node 3008 has been created. A directed arc has its origin at 
node 3006 and its destination at the split load node nj corresponds to arc 
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3002. A directed arc has an origin at the split source node n_s and its 
destmation split load node nj. A directed arc has its origin at the split source 
node and its destination at node 3008. 

The split source node n_s and the split load node n J both represent the 

5 same node: n. Thus, they both correspond to the same single sequential 
element that corresponds to node n* Moreover, directed arcs 3002 and 3002' 
both correspond to the same group of combinational logic represented by the 
same group of netlist nodes stored in m^ory. Similariy, the directed arcs 
3000 and 3000* both correspond to the same group of combinational logic that 

10 is represented by the same group of netlist nodes stored in memory. Finally, 
the directed arcs 3004 and 3004* both represent the same group of 
combinational logic represented by the same netlist nodes stored in electronic 
memory. 

The creation of two split nodes n_s and nj provides a guide in the 
IS form of the MSEG for serial processing of the netlist stored in the electronic 
memory. Self-loop arc 3000 has been replaced by an acyclic arc 3000*. 
Thus, there are no more cycles m the MSEG. The techniques of the present 
mvention advantageously use an MSEG as a guide for serial processing of 
cyclic sequential circuits for the purpose of estimating average power 
20 consumption in accordance with the invention. 

Referring to the illustrative drawings of Figures Ka and Kb, there is 
shown a SEG and a corresponding MSEG. The SEG includes node nl-n4 and 
lA and OB, in a directed graph as shown. Nodes nl-n4 form a loop. Hence, 
the SEG in Figure Ka represents a cyclic gr^h. The loop is broken by 
25 removing node nl and substituting in place of it source node nl_s and load 
nodenlj. The directed arc 3110 which originates at n4 and has a destination 
at nl is replaced by arc 3110' which originates at n4 and has a destination at 
nl_l. In practice, directed arc 3110' can be produced by merely changin^g its 
destination pointer to indicate that nl J is its new destination. Similarly, 
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directed arc 3012 is replaced by directed arc 3012*. Arc 3014 is replaced by 
arc 3014', and arc 3016 is replaced by arc 3016'. 

Since there are no cycles in the MSEG, the combinational logic 
between the sequential elements represented by the nodes nl_s, nl J, and n2- 
5 n4, can be processed serially to produce switching activity values. However, 
before such processing can occur, a determination must be made as to which 
groups of nedist nodes can be processed together. This grouping of nctlist 
nodes to be processed together is accomplished through a levelization process 
described in relation to Figure L. 

10 Primary iiq>uts such as lA arc set to level LO. nl_s also is grouped at 

LO since it has no arcs directed to it. Since n2 receives its sole directed arc 
3012' from nl_S, n2 is at LI . n3 receives its sole directed arc 3018 from n3. 
Hence, n3 b at L2. n4 receives its only directed arc 3020 from N3. Thus, 
n4 is at L3. nl J receives directed arc 3010* ftom n4 which is at L3. nlj 

15 also receives directed arc 3014' from lA which is LO. Since the highest level 
iKxle that nl J receives an arc from is L3, nlj is placed at L4. Finally, OB 
receives a directed arc 3016' from Nl_s. Consequently, OB is at LI. 
The following chart summarizes the levelization results. 



LEVEL 


NODES 


LO 


lA, iil_s 


LI 


02.0B 


L2 


03 


L3 


n4 


U 


nlj 



Once the MSEG has been levelized, the nedist stored m electronic 
memory can be processed to estimate the average power consmi^tion. For 
example, LI processing begins by computing activity values for the group of 
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netlist nodes that correspond to the arc 3012' and 3016*. Arcs, 3012' and 
3016' are grouped in LI since each feed nodes, n2 and OB respectively, in 
LI. The starting SP and TR values provided by nl_s are assigned and are 
refined iterativcly as explamed below. 

5 L2 processing commences as the activity values computed for the 

group of netlist nodes that correspond to directed arc 3012' are transferred 
across n2 and are used as primary inpats during the computation of activity 
level values for the group of netlist nodes that correspond to the directed arc 
3018 which originates at n2 and terminates at N3. Arc 3018 is in L2 since 

10 it feeds n3 which^is in L2. 

L3 processing starts as the activity values computed in connection with 
arc 3018 are transferred across n3 and are used as primary inputs for the 
computation of activiQr values for the group of netlist nodes that correspond 
to arc 3020. Arc 3020 is in L3 since n4 which receives arc 3020 is m L3. 

IS L4 processing begins as the activity values calculated in coimection 

widi arc 3020 are transferred across n4 and are used as primary iq>uts in 
connection with the confutation of activity values for a group of netlist nodes 
that correspond to arc 3010'. Similarly, the primary iiq>ut lA is used in 
computation of activity values for the group of netlist nodes that correspond 

20 to (he arc 3014'. Arcs 3010' and 3014' eadi are in L4 since nl J which they 
both feed is in L4. When activity values have been computed for the entire 
MSEC, a comparison is made between the activity values originally assigned 
to nl_s and the activity values computed for nl J, If they have not converged 
to within a predefined threshold value, values computed values for nl J are 

25 used m a next iteration as assigned values for nl_s. The entire process 
described above repeats. If at the end of the process, the assigned values for 
nlj have not converged sufficienfly with the computed values of nl_s, then, 
once again, the newly computed input values to nl J become the assigned 
values for nl_s during a next iteration of the process. This interative process 
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repeats until the computed values of nl J converge with the assigned values 
of nl_s, or until the system has reached a predefined maximum number of 
allowable iterations. 

A reason for seeking convergence is that nodes nl_s and nl J in fact 
represent the same node nL Thus, the assigned value of that node's split 
source nl_s should be the same as the computed value of that node's split load 
nl J. If the values are differem, then there may have been significant error 
introduced by splitting node nl. The iteration process aims to reduce tiiat 
error through convergence of assigned nl_s values and computed nl 1 values. 

2.4.5 Ttansfening Static Probabilities and Toggle-rates Across 
Sequential Elements • 

Step 5060 helps establish accurate static probabilities and toggle rates 
in electronic circuits containing sequential elements like flip-flops and latches. 
It involves computing the static probabilities and the toggle rates of the output 
of a sequential element from the static probabilities and toggle rates of the 
ir?mts. Stq) 5060 is decomposed into 5 sub-tasks explained in detail below. 

The first task is to identify a generic sequential element that can 
c^turc the general synchronous and asynchronous behaviors of many types 
of commonly encountered sequential elements. 

The second task is to describe each xyipt of commonly encountered 
sequential element as combinational logic connected to die iiqHits of the 
generic sequential element selected in the first step. 

The third task is to characterize the toggle raies and the static 
probabilities of the outputs of the generic sequciuial element as relatively 
simple functions of the itq>uts. 

Tlie fourth task is to rq)lace each actual sequential element m the 
circuit with its generic equivalent. 
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The fifth task is to use the methods for computing static probabilities 
and toggle rates described in earlier sections to compute each sequential 
element's static probability and toggle rate. 

Each of these tasks will be described in turn in the followmg sections. 
5 2.4.5. J Task 1: Defining and Using a Generic Sequential 

Element 

This section introduces a model for sequential element. This 
model can represent all flip-flops and latches, and in general any sequential 
element that consists of a single state. Sequential elements that encompass 

10 multiple states, like Master-Slave latches, coimters and RAM's, are not 
covered by the model. That is, multiple state sequential elements cannot be 
represented by a single instance of the proposed model. However, they can 
be represented by multiple instances of the general sequential model. The 
generic model of a sequential element (GEN) is a cell with 6 iaputs and 2 

15 outputs. Table 1 explams the meaning of these puis. Table 2 describes 
commonly used sequential elements using this model. 



20 



Pin 


Type 


Function 


sync 


Input 


synchronous behavior of ceU (fsync) is input to this pin. 


ck 


Input 


function driving the dock pin. (fclk) 


fDO 


Input 


asynchronous behavior resulting in Q=Q, QB^f^ 


fOl 
no 


Input 
Input 


asynchronous behavior resuidng in QsQ, QBal (foi) 
asynchronous behavior resulting in Q=l. QB=0(fto) 


ni 


Input 


asynchronous behavior resulting in Qsl. QBsUfn) 


Q 


• Output 


output function 1 


QB 


Output 


output function 2 



Table 1: Generic sequential cell (GEN) 
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5 





1 Nu 














D nop 


1 


D 


CK 


0 


0 


0 


0 


ocan u 


2 


D*!T + I*T 


CK 


0 


0 


0 


0 


D w/clear 


3 


D 


CK 


0 


CL 


0 


0 


Scan D /clear 


4 


D*!T+I*T 


CK 


0 


CL 


0 


0 


Dw/clear/set 


5 


D 


CK 


CL*ST 


CL*!ST 


!CL*ST 


0 


Scan D cl/set 


6 


D»!T+I*T 


CK 


CL*ST 


CL»!ST 


!CL*ST 


0 


Dilop w/set 


7 


D 


CK 


0 


0 


ST 


0 


ScanD w/sct 


8 


D*!T+I*T 


CK 


0 


0 


ST 


b 


JKfiop 


9 


!J*!K*Q 


CK 


0 


0 


0 


0 






+ J*}K 
















+ J*K*!Q 












ScanJK 


10 


!J*!K*!T*IQ 


CK 


0 


0 


0 


0 






+J*!K*!T 
















+J*K*!T*!IQ 




























JK w/dear 


11 


U*!K»Q 


CK 


0 


CL 


0 


0 






+ J*!K 
















+ J*K*!Q 












ScanJK 


12 


!J*!K»rPIQ 


CK 


0 


CL 


0 


0 


w/clear 




+J»!K*!T 
















+J*K*rr*!IQ 
















+I*T 












JKflop 


13 


!J*!K*Q 


CK 


CL • ST 






U 


w/dear/set 




+ J*!K 
















+ J*K*!Q 












ScanJK 


14 


!J»!K'!T*IQ 


CK 


CL'ST 


CL* !ST 


!CL«ST 


0 


w/dear/set 




+J*!K*!T 
















+J*K*!T*!IQ 
















+I*T 












Latch 


IS 


0 


0 


0 


G*!D 


G*D 


0 


Latch inv 


16 


0 


0 




!G*!D 


!0*D 


0 


Latch w/clear 


17 


0 


0 




G*!D + C 


G*D*!C 


0 



Table 2: Commonly used sequential gates 
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CcU 


Nu 




•rtk 


# 1 

foo 1 




'10 1 


■11 


Latch inv 


18 


0 


0 


0 






n 
u 


w/clear 










C 






Sync cn D 


19 


D*EN 


CK 


0 


0 


0 


0 


Sync enable 


20 


EN • (D*S 


CK 


0 


0 


0 


0 


feedback D 




+ Q«!S) 












T flop w/clear , 


21 


!Q 


CK 


0 


CL 


0 


0 


T flop w/sct 


22 


!Q 


CK 


0 


0 


ST 


0 


SRlatch 


23 


0 


0 


S*R 


!S*R 


S*!R 


0 


set/clear D 


24 


D 


CK 


0' 


CL*!ST 


!CL*ST 


0 


mux D w/clear 


25 


S*D + !S*Q 


CK 


0 


CL 


0 


0 


Gated clock 


26 


D 


E«C 


S*R 


!S*R 


S*!R 


0 



Table 2: Commonly used sequential gates 



Gdl 


Nu 


Sequential model equation (for Q plus) 


Dflop 


1 


CK*!CKP*D + (!CK+CKP)»Q 


ScanD 


2 


CK*!CKP»(D*!T + 1*T) + (!CK4CKP)*Q 


D w/clear 


3 


(CK*!CKP*D + (!CK4CKP)*Q)*!CL 


ScanD /clear 


4 


(CK*!CKP*(D*rr + I*T) + (ICK4CKP)*Q)*!CL 


D w/clear/set 


5 


(CK*!CKP*D + (!CK4CKP)*Q + ST)*!CL 


-Scan Del/set 


6 


(CK*!CKP*(D*!T + I*T) + (ICK+CKPrQ + ST)*!CL 


D flop w/set 


7 


CK*!CKP»D + {!CK4CKP)*Q + ST 


ScanDw/set 


8 


CK*!CKP»(D*!T + I*T) + (!CK+CKPrQ + ST 


JKflop 


9 


CK»!CXP*(!J»!K*Q+ J*!K+ J*K*!Q) + (!CK4CKP)*Q 


ScanJK 


10 


CK»!CKP*(U*!K*!T»Q+J*!K*IT+J»K»rr*!Q+IT>h(!CX+^ 


IK w/clear 


11 


(CK*!CKP*(!J»!K»Q4- J*!K+ J*KMQ) + (!CK4CKP)»Q)»!CL 


ScanJK 


12 


(CK*!CKP*(!J*lK*!T*QtJ*!K*!T+J»K*!T*!Q+I*T) 


w/dear 




+ (!CK4CKPrQ)*!CL 


JKflop 
w/clear^set 


13 


(CK*!CKP*(!J»!K*Qf J*!K+ J»K*!Q) + (!CK4CKPrQ + ST)*!CL 






ScanJK 


14 


(CK* !CKP'(!J' !K' !T*Qf J* !K* !T+J'K* !T* !Q4-I*T) 


w/clear^set 




+ (!CK4CKPrQ + ST)*!CL 


Latch 


15 


G*D + !G»0 



Table 3: Application of the sequential model equation 
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Cdl 


Nu 


Sequential model equation (for Q plus) 


Latch inv 


16 


!G*D + G*Q 


Latch w/clear 


17 


(G*D+!G*Q)*!CL 


Latch inv 


18 


(!G*D + G*Q)*!CL 


w/clear 






Sync en D 


19 


!CK*CKP*D*EN + (CK + !CKP)*Q 


Sync enable 


20 , 


!CK*CKP*EN* (D*S+ Q*!S) + CK+!CKP)*Q 


feedback D 






T flop w/dear 


21 


(!CK*CKP*!Q + (CK+!CKPrQ)*!CL 


T flop w/set 


22 


(!CK* CKP*!Q + ( CK+!CKP)*Q) + ST 


SR latch 


23 


(Q + S)*!R 


set/clear D 


24 


(!CK*CKP*D + (CK+!CKP)*Q)»(!CL45T) + ST*!CL 


mux D w/dear 


25 


!CK*CKP* (S*D + !S»Q) + (CK + !CKLP)»Q 


Gated clock 


26 


((!C + !E)*(CP*EP)*D + (C*E + !CP + !EP)*Q + ST) • !CL 



Table 3: Application of the sequential model equation 
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A sequential cell usually has two outputs Q and QB. If Q and QB are 
opposite to each other (e.g., flip-flops with no asynchronous behavior, 
D-latches, etc.), Q and QB are said to be "related. " When Q and QB are not 
opposite to each other, the two ou^uts are said to be "unrelated." 

5 Al. Assumption: The asynchronous functions are pairwise disjoint. 

An important assunq)tion made in the generic model is that for a given 
mput stimulus, at most oi^ of the four asynchronous functions is equal to 1. 
This assumption is valid because none of (he outputs are ever driven to 0 and 
1 at the same time. Assumption Al implies tiiat the assertions: all 

10 asynchronous inputs to a sequential elements are always logically disjouit. 
That is, applying the logic AND operation to any pair of asynchronous inputs 
of the same sequential element would always produce the logic value '0\ 

Tte formulation of a generic sequential model begins with the 
introduction of the "plus"(+) operator. The "plus" operator is used to 

IS represent the value of a variable or a function at an instant that is just after die 
present time. To understand this new operator, consider some of its 
properties. Let f be a function of n ii^nit variables (Xi, ...» xj. 

PI. If f is a constant valued fuiK^tion, i.e., f is either a tautology or 
the zero function, then/*^ =/ 

20 P2. 

P3 

Given a variable whose value is known at time t, is just another 
variable that denotes the value of x^ at a time (/ 4 €), that is just after t. 
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Equation El, given below, presents a logic function that accurately captures 
the value of the Q output of a sequential element at a time c after the present. 

/io + /u 

The above model is what makes possible the computation of static 
probabiUties and toggle rates of the outputs of sequential elements, because it 
relates each output's logic function to that of the inputs. In essence, the 
model transfoims a sequential element into a combinational one, which then 
enables the use of combmational techniques previously described. 

_ 2.4.5.2 Task 2: Describing commoiify encountered Sequential Element 
Let us try to express a D flip-flop (Table 2) using this formulation. 
A D flip-flop exhibits the foUowii^g behavior; Whenever the clock input (CK) 
15 rises from 0 to to 1, the output Q is equal to the vahie at the data pin D. At 
aU other times, the flip-flop stores its "previous" state. TTuj "previous" state 
of the flq>-flop is the value at the Q output of the flip-flop. Q -I- can be 
written as: 



10 



20 



Q* = -nC^:- CK*'D+{CK-i-^CK*) - Q (E2) 



Consider a D-latch (Table 2). Note that a latch does not have any 
synchronous behavior as per our sequential model. Assuming that the latch 
has a data pin D and an enable pin G, we can write the equation for a latch 
25 in the following 



Q* = Q-iG + D-G 



(E3) 
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Equation £3 states that the ou^ut of a latch is 1 whenever both D and 
G pins are 1. This depicts the transparent behavior of a latch. Equation E3 
also shows that when the enable pin G is 0, the output is the previous state. 
Note the absence of the clock variables CK, CK"*^ in E3. 

5 Equation El has two parts: a part diat ^picts (he synchronous 

behavior of the sequential cell and a part that depicts the as)mchronous 
behavior. If either of the asynchronous functions fio or fu is equal to 1, then 
the value of Q"*^ must be equal to 1 regardless of any of the other components 
of the equation. Sunilarly if either of or ^ are equal to 1, the value of Q"*^ 

10 must be 0. The synchronous behavior is always expressed in relation to a 
clockedge. If there is a transition in the clock fianction (f^^ from 0 to 1, the 
ou^ut should follow the value of the synchronous functionality (^y^c) of the 
cell. At all other times, Q*** remains in the "previous" state (Q). 

In an analogous maimer, the QB output of the cell can be written as: 



15 



20 



Lemma 1: IiF5)o=0 and f„=0 then Q*=-i0B* 



2.4.5.3 Task 3: Characterizing the Generic Sequential Element 
As described in the previous section, the generic sequential element has 

ii^uts sync, ck, /^^ /(^, fio^nd f^. These ii^mts are assumed to be Boolean 

logic functions of primary variables x^. 
25 Thes£! primary variables are assumed to have a static probabiliQr and 

a toggle rate. The generic sequential elements also have outputs Q and 0B. 
For flqhflops, the static probability for Q is given by: 



Pr{Q) = Pr(sync)Pr{fM\(/u) ^P^ifio-^fn) 
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The static probability for QB is given by: 

PriQB) = Pr{J^c)Pr(foi^ojJn)+Pr{f^Q+fi^) . 

The toggle rate for Q is given by: 



The toggle rate for QB is given by: 



To compute Pr(jry/ic © 0 and Pr(jy/ic © QB) .treat fi and as 
primaiy iipits with the static probabilities conoputed above. Note that sync 
can be a function of g or fiB. If sync is a function of Q or QB, then this 
sequential cell has a combinational feedback path from the flip-flop's ou^ 
back to one of its upits. As described in Section 2.4.2 all such self-loops 
were identified and flagged in step 5015. That was perfonned specifically to 
provide information for this stq) of con5)uting the static probabilities and 
25 toggle rates. If the examined flip-flop was not flagged as a self-loop node, 
sync can also be treated as a primaiy inpvt shnplifying the computation of the 
static probability and toggle rate values. However, if tiie flip-flop was flagged 
as a self-loop node, sync must be expanded as a function of the primaiy mpats 
feeduig that level of the SEG. 
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The equations are different for sequential elements that are latches. 



- ^'{/o(/oi J + ^H^/o/oi (fio +/n) J 



5 _ fr(/oi+^,^ 

" l-'*4^o)+M/^o(/oi+/ii) ) 

The toggle rates are given by: 

15 2.5 Examples of Transfers Across Sequential Logic Elements 

Refinting to Figure M, there is shown a generalized logic diagram 
illustrating an exemplary electronic circiut and the organization of a 
corresponding netlist stored in electronic memory that represents the gates and 
wires of such circuit. The circuit has primary i]^>utsIA-ID. It has groups of 

20 combinational logic, CLl, CL2 and CL3. It includes sequential elements SEl 
and SE2. Primary inputs lA, and IB feed CLl. Primary inputs EB, IC and 
ID feed CL2. CL1 feeds SEL SEl feeds CL3. CL2 feeds both SEl and 
SE2, CL3 feeds SE2. 

The circuit is presumed to cone^nd to an MSEG (not shown) which 

25 has been levelh:;^. The levelization has resulted in a grouping of sequential 
elements and a groiq>mg of combinational logic mto different graph levels. 
Specifically, the primary iiq>uts lA-ID are grouped in level LO. The group of 
combinational logic represented by CLl is grouped in LI smce it feeds SEl, 
and since there is no other sequential element interposed between CLl and the 
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primary ii^juts that feed CLL The group of combinational logic represented 
by CL2 is grouped in both LI and L2 since the group of logic represented by 
CL2 propagates signals to SEl and SE2. In general, a group of combinational 
logic is placed in the same level(s) as the sequential element(s) fed by such 
5 group of combinational logic. Thus, although CL2 feeds both SEl and SE2, 
SEl is considered to be part of LI, and SE2 is considered to be part of L2. 
CL3 is grouped in L2 since it only feeds SE2. 

Stated differently, SEl is a "lower" or "earlier" or "prior" graph level 
to SE2. and SE2 is at a "higher" or "later" or "subsequent" graph level SEl. 

10 Similarly, logic CL3 is at a subsequent graph level to CLl, and CLl is at a 
prior graph level to CL2 and CL3. Prior logic levels feed subsequent levels 
in tte logic flow of Figure M. 

The MSEG (not shown) that corresponds to the circuit of Figure M 
includes a directed arc(s) that corresponds to the group of logic represented 

15 by CLl Other directed arc(s) corresponds to the group of logic represented 
by 02, Yet another directed arc corresponds to the group of logic 
represented by CL3, The MSEG includes a graph node which corresponds 
to SEl and mcludes another graph node that corresponds to SE2. Of course, 
if loops have been broken, io the course of produdng the MSEG, then a graph 

20 node corresponding to SEl may have been removed and replaced by a split 
source node and a split load node. Similarly, a graph node corresponding to 
SE2 may have been removed and replaced by a split source node and a split 
loadiKxle* 

Processmg of the netlist that represents the dtcuit involves first 
25 identifymg arcs mLl of the MSEG (not shown) and correlating those LI arcs 
with the group of combinational logic of the circuit represented by 
combinational logic CLl and CL2. Switching activity values are computed 
for the group of netlist nodes stored in memory that represent CLl and CL2. 
The computed switchmg activity values arc provided to the input side of SEl. 
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The values are transferred across SEl to the output side of SEl where they 
become available as primary inputs to the netlist nodes and nets that represent 
the group of combmational logic CL3. 

Once processmg of LI of the graph is complete, then processing of L2 
5 begins. An MSEG arc that corresponds to combmational logic CL3 is used 
to identify a group of netlist nodes stored in memory that represent the group 
of combinational logic CX3. Similarly, an arc of the MSEG that corresponds 
to CL2 is used to identify a group of netlist nodes stored in memory that 
correspond to the group of combmational logic CL2. The processing of the 
10 groups of combmational logic CL2 and CL3 results in the provision of 
primary outputs from L2 which are provided to the input side of SE2- These 
primary ou^uts from L2 are transferred across SE2 and serve as a basis for 
computing the outputs of SE2. 

It should be appreciated that there are a number of techniques for 
15 cornputing switching activity values for nets representing combinational logic 
in a particular level. In a presently preferred mq>lementation of die invention, 
static probabiliti^ (SPs) and transition rates (TRs) are computed usmg BDDs. 
However, alternatively, different switchmg activity measures and computation 
techniques may be employed. For example, correlation coefficients or 
20 transition probabilities may be calculated instead. Moreover, the computation 
of switching activity levels may be accomplished using netlist nodes rather 
than BDDs. Note that nets correspondmg to different graph level are 
processed substantially mdependentiy of each other. Although ou^uts from 
one level may serve as a basis for mpnts to a next level, actual computations 
25 of switctung activi^i^ues progresses on a level-by-level basis. 

Techniques in accordance wifli a current unplementation of the 
invention provide an efficient mechanism for accomplishing a transfer of 
primary ou^ut switchuag activity values computed for one level of a graph 
across a node rqpresenting a sequential logic element so that those prunary 
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output values can serve as a basis for primaiy iiqjuts for a next level of the 
graph. For example, in Figure M, there is a transfer of values computed for 
LI across a sequential element SEl, and there is transfer of values computed • 
for 12 across sequential element SE2. The transfer across SEl involves input 
5 of values to SEl which, in essence, are the primary outputs of the group of 
combinational logic CLl. Likewise, CL3 primary output values provided as 
input to SE2 serve as a basis for values output from SE2. 

A transfer across a sequential element such as SEl or SE2 can be 
challenging because there are numerous types of sequential elements. For 

10 example, see the list of sequential element identified in Tables 2 and 3. While 
certain sequential elements merely require relatively straight forward transfer 
of an input value to an ou^ut terminal (see D flip flop for exanq)le), other 
sequential elements involve outputs that are complex functions of logical 
inputs, tunmg signals and prior logical outputs. The current invention 

15 provides mechanisms for transfer of ii^ut values across a wide variety of 
types of siniple or complex sequential elements. 

A currently preferred embodiment of the invention en^)loys a generic 
sequential cell (GEN) which is illustrated in Figure N. Table 1 e^qilains die 
funcdonality of the various input and ou^ut pins of the GEN cell. The 

20 generic model provided by the GEN serves as a basis for the transfer of 
switching activity values across any of numerous types of sequential elements. 
In Figure N, dashed line 3030 represents an iiqnit side of a given sequential 
element (SE) that can be modeled using the GEN cell. Dashed line 3032 
represents an output side of the given sequential element (SE). 

25 In the example shown m Figure N, the given SE is a JK flip flop with 

clear and set. This type of sequential element is indicated as entiy "13" in 
Table2. The GEN driving function identified as ^ is derived for the JK ^ 
flop using the logic equation indicated for entry 13 in the column headed 
The GEN driving function is provided as the iapuL The respective ^ 
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, foi and fio driving functions are derived from the corresponding logic 
functions indicated in entry 13. fu is set to a logic zero. Referring to Table 
3» a Q plus output is derived from the logic function mdicated at entry 13. * 
The value of QB is derived from its logic function in table 3, entry 13. 
5 In operation, the JK flip flop with clear and set is modeled using the 

GEN cell. Combinational logic in level i produces switchmg activity values 

^ for CL, ST, CK, J, K, and Q. These values are provided as an SE inpai. In 
accordance with the currently preferred techniques of the present invention, 
BDDs are constructed to represent the logic functions indicated for each of the 

10 inputs to the GEN cell. For example, BDDs representative of the logic 

function of cohmm f^^, entry 13 are produced witfiin the logic cone indicated 

*■ ■ 

by dashed lines 3034. When the BDD for logic cone 3034 is evaluated, it 
produces an SP, TR or switching activity value for f^,^. Similarly, respective 
BDD logic cones 3036, 3038, and 3040 rq)resent BDDs correspondmg to 
15 entries under respective cohnnns ^ , ^] and fiQ. The value of ^tki^^^^^iue 
provided by the level i logic as the SE input as for CK. The value of .f„ is 
fixed at zero. 

Referring to Table 3, the value of Q plus is evaluated accordu^ to 
table entry 13. As indicated in Figure N, a BDD logic cone rqjrcsaits the 
20 logical function indicated by entry 13. When tfiat BDD is evahiated, the 
switching activiQr value it provides is the value for Q phis. That vahie serves 
astheSEouQmt The SEou^ut can serve as a primary iiq>ut to a level i+1 
of logic. The QB plus switching activity value is computed in a similar 
manner. 

25 Thus, ttie GEN serves as a primitive electronic structure m memory 

which siq>ports generic forcing functions and is used to compute generic 
ou^uts. In order to model a specific sequential element using the primitive 
electronic structure, a data structure is provided in memory which relates 
behavioral information about specific sequential elements to be modeled to 
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forcing functions and outputs of the GEN primitive. Tables 2 and 3 provide 
these behavioral relationships in the present invention. Logic is generated in 
memoiy to convert specific sequential element hspats and outputs into generic * 
cell inputs and ou^uts. In the present embodiment, the generated values are 
5 used in accordance with the equations below to compute SPs and TRs. 
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3.0 Mixing Simulation-based and Probability-based Analysis 

The previous sections described the process for computing the static 
probabilities and the toggle rates for all nets in a design given the static 

5 probabilities and toggle rates of the design's primary inpvt ports. The 
described method also allows the setting of the static probability and toggle 
rate of internal nets. Those nets are then considered as start*points and are 
treated like primary mpvn ports. By setting their static probability toggle-rate, 
the designer indicates that those values should not be changed during the 

10 processing. 

Setting the static probabilities and toggle rates of select nets allows 
improved accuracy and shorter run-times. The accuracy improvements come 
about because there are fewer nets that have estimated static probability and 
toggle rate values. The run-time improvements are achieved because the 

IS sequential processing may not require iteration, or it may requue less iteration 
as a result of the additional start-point nodes. 

The ability to support a hybrid analysis technique that combines 
simulation-based and probability-based techniques enables this additional 
improvements to the accuracy and efficiency of the power estimation. 

20 

User Interface for Power Estimation 

The following is descrq)tion of user interface command which provide 
access and connx)l of power estimation tool. 

Key attributes of the power estimation user interface are as follows: 
25 • Allows user to define aNClock, which can is referred to either explicidy 
or in^licitiy by other commands. The dock is the synchronous signal 
which typically determines the maximum frequenqr of the network. 
This clock signal is referred to in the next section. 
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• Allows reportj)ower to be run with partial or no information for 
toggle_rate and static probability on input ports of the design. If the 
user does not provide toggle rate for an inpat port, then a a defeult * 
assumption of .5 *related_clock is assumed. The related^clock is 

5 determined by traversing network from inpat to a sequential element. 

The net driving the clock pm is assumed to be the related clock. If 
there is no sequential element then the highest frequency clock is 
assumed to be the related_clock. If the user does not provide 
staticjrobability, then a value of .5 is assumed. This is a key 

10 advantage of the user interface m that it allows user to run 

reportjpower after only providing information about the design's 
clock. This reduces the amount of iiq>ut before a power estimate can 
be performed. In contrast, the simulation method requires a set of test 
vectors before power estimation can occur. 

15 • The user interface allows for any pomts in the iietwork to be annotated 
with toggle rate and static probability. This allows for a power 
simulation with partially annotated network, in which switching 
infoimation is provided for a subset of the nodes. Probabilistic 
propagation of activity will occur to determine ihe toggle rate and 

20 static probability for the xemaining (non annotated) signals in the 

network. This allows the user to extract simulation data from either 
a higher level simulation (i.e RTL levd) or from selected nets. By 
extracting from a different level or from selected nets, the user can 
speed die extraction of simulation information. 
' 25 Following is conmiand descrqition (manual pages) for die power 

estimation user interface. 
NAME 
reportjpower 
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Calculates and reports dynamic and static power for a design or 
instance. 
SYNTAX 

int report j)ower [-net] [-cell] [-only cell_orjiet_list] 
5 [^cumulative] [-flat] [-exclude_boundary_n^] 

[-analysis_effort low | medium | high] [-verbose] 

[-nworst number] [-sort_mode mode] 

[-histogram [-exchide Jeq le_yal | -exclude_jeq ge_val] 

[-nosplit] 

10 objea_list cell_or_net_list 

int number 

string mode 

float le_val 

float ge_val 
15 ARGUMENTS 

"-net-ceU" 

Indicates whether power consunqition of nets and/or cells is to be 
r^rted. By default, neither option is enabled, and only the design's 
summary power infonnation is reported. The -cell and -net options can be 
20 used singly or togetfier. 

•-only censor jiet_list" 

Specifies a list of cells and/or n^ to be displayed with -net or -cell. 
With this option, only the cells and/or nets in the ceUjorjiet^list are listed in 
the power report If botii tiie -net and -only options are specified, then flie 
25 cell_or_net_list should contain at least^neret. Shnilarly, if both the -cell and 
-only options are specified, then the ceU_or_net_list should contam at least one 
cell. If the -net, -cell, and -only options are specified together, flie 
cellj)r_^netjist should contam at least one net and one cell, 
-cumulative 
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Indicates that cumulative power is to computed and displayed for every 
net and/or cell in the power report- The fanin cumulative power of a cell or 
net is the sum of all power values for cells and nets in the transitive fanin of 
the start point. Similarly, the fanout cumulative power of a cell or net is the 
S sum of all power values for cells andnets in the transitive fanout of the start 
point. The cumulative report is displayed after the standard cell or net report. 
The -cumulative option is valid only if -net and/or -cell are specified, 
report_power annotates the cumulative power values for the specified cells 
and/or nets. 
10 -flat 

Indicates that the power report should traverse the hierarchy and report 
objects at all lower-levels (as if the design's hierarchy were flat). The default 
is to repon objects at only the current level of hierarchy. For cell reports, if 
-flat is not specified, the power reported for a subdesign is the total power 
IS estimate for that subdesign, inchidii^ all of its contents. 

-exclude_bomidaiy_nets * 
Indicates that the power of boundary nets is to be excluded from the 
power report; the default is to include all nets. At the top level of a design, 
only Ac primary iiq)ut nets qualify as boundary nets. For a lower level block 
20 of the design, nets that feed into that block are considered boundary nets. For 
boundary nets tibiat are also driven by an enclosed cell, the switching power 
is scaled accordmg to the number of internal (versus external) drivers. This 
option affects the nets that are chosen to display in the net-specific report as 
well as the vahies of die design's switching power. This option does not 
2^ affect leakage power or internal power values. 

"-analysis^efifort low | medium | high" 

Provides a tradeoff between runtime and accuracy. The default is low. 
Low effort results in the fastest runtune and the lowest accuracy of power 
estimates; medium and high efforts result in a longer run that has increasing 
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levels of accuracy. The analysis effort controls the depth of logic that is 
traversed to detect signal correlation. Variations of runtime and accuracy 
depend gready on circuit structure, 
-verbose 

5 Indicates that additional detailed information is to be displayed about 

the power of the cells and/or nets. This option is valid only if -net and/or 
-cell are specified. 

"-nworst ninnber" 

Indicates that the report is to' be filtered so fliat it displays only the 
10 highest number power objects. Thisoptionis validonly if either -net and/or 
-cell is specified. 

"-sort jmode mode" 

Determines the sortiiig mode for r^rt order and -nworst selection. 
The available sorting modes for the -net or -cell options are listed below. 
15 -n^ option -cell option 



name name 
camulative_fanout cumulative_fenout 
cumulative_£anin cumulative_fanin 
20 net_static_probability ccD^intemal jwwcr 

net switchingj)ower cellJleakagejK>wcr 
net_toggle_rate dynamic j)owcr 
total_net Joad 

25 If both Ae -net and -cell options are specified and a sorting mode is 

explicifly selected, tiie selected sorting mode is used for both cell and net 
reports. Therefore, you must select a sorting mode that applies to bofli the 
-net and -cell options. 
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If the sorting mode is not explicitly set, a default is chosen based 
the mode of the report jower command: 

Mode Implicit Default 



5 -net net_switchingj>ower 

-cell cell_intemalj)ower 
-net -cell dynamic_power 
"histogram [-exclude Jeq le_val | -exclude_geq ge^val]" 
Indicates that a histogram-style report is to be displayed showing the 
10 number of nets in each power range. _cxclude_leq and -exclude^gcq can be 
used to exclude data values less than le_val or greater than ge_val, 
respectively. Useful for displaying the range and variation of power in the 
design. This option displays the histogram report only if either -net or -cell 
is specified. 
15 -nosplit 

Most of the design information is listed in fixed-width colmnns. If the 
information for a given field exceeds its colmnn's width, the next field begins 
on a new Ime, starting in the correct cohmm. This option prevents line- 
splitting and facilitates writing software to extract information from the report 
20 output. 

DESCRIFnON 

Calculates and reports power for a design. The probabilistic estimation 
algorithm functions on nets that were not explicitly annotated with switching 
activiQr values. During the probabilistic propagation, report^power uses the 
2S start point nets' switching activities valups^ if available, when conq>uting ttie 
switching activity values for internal n^. The switching activity values are 
retamed for any nets that were annotated with the set_switching_activity 
command; that is the values are not overwritten during the probabilistic 
propagation. 
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Options allow you to specify cells and/or nets for reporting. The 
default operation is to display the summaiy power values for only the 
current_design. If a current^instance is specified, report_power instead 
displays the summary power values for that instance. The instance's power 

5 is estimated in the context of the higher-level design; that is, Msmg the 
switching activity and load of the higher-level design. 

The -verbose option causes more detailed power information to be 
displayed. The -flat, -exclude_boundary_ncts, -nworst, and -sort_mode 
options allow filtering of objects that are selected by report_power. The 

10 -exclude_boundary_nets option also affects the way that the design's power 
values are computed by excluding certain nets from the design's totals. The 
-sortjnode option also affects the formatdsg of the power reports by 
modifying the order of nets and/or cells that are displayed by reportjpower. 
The -cumulative and -histogram options cause additional sections to be 

15 displayed in the power reports. The cumulative power section contams 
transitive fanin and fanout vahies for cells and/or nets in the design. The 
power histogram classifies the nets or cells mto gmxps of power values, 
allowing for easier visual analysis of the range of power values and of the 
distribution of the nets/cells across that range. Suboptions allow pruniAg of 

20 objects in the histogram by excludmg values greater tiian or less than specified 
values. 

If ifaey are not specifically annotated witii switching activity 
mfonnation, aU input ports and black-box cell outputs are assumed to have a 
default static probability of 0.5 and a toggle rate of (0.5 fcDc), where fcDc 
25 is the toggle rate of the object's related clocL 

Power analysis uses any back-annotated net loads during the power 
calculation. For nets that do not have back-annotated capacitance, the net load 
is estimated from the appropriate wureload model. If any cluster information 
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has been annotated on the design (Roorplan Manager), DesignPower uses the 
improved capacitance estimates from the cluster's wireloads. 

When invoked from within dcjhell (Design Compiler), the * 
report j)ower command first checks out a DesignPower license. If a license 
5 is not available, the conmiand terminates with an error message. Otherwise, 
the command proceeds normally. At the completion of the command, the 
DesignPower license is released. To prevent the release of the license at the 
completion of the report jpower command, you can set the environment 
variable power_keep_licens_afterj>owerjconnnaxids to false. 
10 The above variable is valid only under dc^shell (Design Compiler). 

Under dp_shell (standalone DesignPower), the DesignPower license can never 
be released because it is required to run the executable. 

EXAMPLES 

The following example shows a rqrartjpower summary report. A 
15 medium effort analysis is performed to estimate the design's power values. 
dc_shell> report j>ower -analysis medium 
Information: Updating design information... (UID-8S) 
Performing probabilistic propagation through design. 

20 Report : power 

-aiialysis_effort medium 
Design: ALARM.BLOCK 
Version: v3.2a 

Date : Sun Jun 19 15:45:24 1994 

Iibrary(s) Used: 

power_irbdb (Ffle: /remoteAibraries/powerJib.db) 
Operating Conditions: 
Wire Loading Model Mode: enclosed 
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Design Wire Loading Model Library 

ALARM^BLOCK 0.5K__TLM pcwerjib.db 

ALARM_STATE_MACHINE 0.5K_TLM pcwerjib.db 

5 ALARM_COUNTER O.SK^TLM pcwerjib.db 

ALARM_COUNTER_DW01_inc_^6_0 0.5K_TLM power^lib.db 
Global Operating Voltage = 4.75 
Power-specific unit infcnnaticn: 
Voltage Units = IV 
10 Capacitance Units = 50,029999ff 
Time Units = Ins 

Dynamic Power Units = lOuW (derived from V,C,T units) 
Leakage Power Units = InW 

15 Cell Internal Power = 165.1648 uW (32%) 

Net Switching Power = 348.8617 uW (67%) 



Total Dynamic Power = 514.0266 uW (100%) 

Cell Leakage Power = 76.0000 nW 
20 Tbe following exan^le shows a net power report sorted by 

net_switching_power and fiheied to display only the 5 nets with highest 
switching power. A low effort analysis is performed to estimate the design's 
power values. 

dc_shell>rq)ortj)ower -net -flat -nworst 5 

Report : power 
-net 

-analysis_effort low 
-nworstS 
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-flat 

-sortjnode i»t_switchiiig_power 
Design: ALARM_BLOCK 
Version: v3.2a 
5 Date : Sun Jun 19 15:45:26 1994 

Ik 4<**i|c4i«*«:^«i^«i|(«4i»i|ii|i«i|,)|i:|,:|i:|i4,i|ii|i4ii|i 4.:), 4,1), 4,^1*41 4i*4c 411(1***41 

Libraiy(s) Used: 

power_lib.db (File: /remote/libraries/powerjib.db) 
Operating Conditions: 
10 Wire Loading Model Mode: enclosed 

Design Wire Loading Model library 

ALARM_BLOCK O.SK.TLM power_Ub.db 

ALARM_STATE_MACHINE 0.5K_TLM powerjib.db 

15 ALARM_COUNTER 0.5K_TLM power^Ub.db 

ALARM_COUNTER_DW01_inc_6_0 0.5K_TLM powerjib.db 
Global Operating Voltage = 4.75 
Power-specific unit information: 
Voltage Units = IV 
20 Cq)acitance Units = 50.029999ff 
Hme Units » Ins 

Dynamic Power Units = lOuW (derived from V,C,T units) 
Leakage Power Units - InW 

Total Static Toggle Switchii^ 
25 Net Load Probr Kate PamerAllis 

ACOUNT/CLK 20.467 0.500 O.IOOO 115.5149 

ACOUNT/n493 23.193 0.985 0.0250 32.7255 

ASM/n225 9.165 0.985 0.0250 12.9314 
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AC0UNT/HRS_0UT[3] 6.365 0.537 0.0303 10.8763 
AC0UNT/HRS_0UT[2] 5.161 0.537 0.0303 8.8202 

Total (5 nets) 18.0868 uW 

5 The following example displays a cell icpon, in which an additional 

cumulative cell power report is generated. The cells are sorted by cumulative 
fanout power values, and only the top 5 are reported. A low effort analysis 
is performed to estimate the design's power values. 

dc_shell> report j>ower -cell -flat -cumulative -sort_mode 
10 cumulative_fanout -nworst 5 

Report : power 
-cell 

-analysisjeffort low 
15 -nworst 5 

-cumulative 
-flat 

-sort jnode cumulative_£anout 
Design: ALARM.BLOCK 
20 Version: v3.2a 

Date : Sun Jun 19 15:45:28 1994 

4c4>4r4>i»«4ti|i4t««*«*4t4t4t4t4>4ei^««*4i4t**4t4ii^«*****4t4t4i*«4t*4t4t* 

Libtary(s) Used: 

powcr^lib.db (File: /root/libxaries/power_lib.db) 
25 Operating Conditions: 

Wue Loadmg Model Mode: enclosed 

Design Wire Loading Model Library 



ALARM_BLOCK 



0.5K_TLM power_lib.db 
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ALARM_STATE__MACHINE 0.5K_TLM power^Ub-db 

ALARM_COUNTER O.SK^TLM powerjib.db 

ALARM_COUNTER_DW01>cX0 0.5K_TLM powerjib.db 



5 Global Operating Voltage = 4,75 

Power-specific unit information: 
Voltage Units = IV 
Capacitance Units = 50.029999ff 
Time Units « Ins 

10 Dynamic Power Units = lOuW (derived from V,C,T units) 

Leakage Power Units «= InW 
Attributes 



h - Hierarchical cell 

15 Cell Driven Net Tot Dynamic Cell 

Internal Switching Power Leakage 

Cell Power Power (% Cell/Tot) Power Attrs 



ACOUNT/MINS^OUT_reg[l] 3.8997 13.2200 17. 120(22%) 1.0000 

20 ACOUNT/MINS^OUT_reg[3] 10.8977 2.0806 12.978(83 %)1.0000 

ACOUNT/MINS^OUT_.reg[0] 10.8987 2.0744 12.973(84%)1,0000 

AC0UNT/MINS_0UT^reg[4] 10.8974 2.0869 12.984(83 %)1.0000 

ACOUNT/MINS^OUT_^reg[5] 10.8977 2.0770 12.975(83%)1.0000 



4.7491UW 2.1538UW 6.903uW(68%) S.OOOOnW 
Cumulative 

Transitive Fanin Transitive Fanout 
Power Power 



25 Totals (5 cells) 

Cumulative 

Cell 
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ACOUNT/MINS_OUT_reg[ll 


17.11972 


182.40425 


AC0UNT/MINS_0UT_reg[3] 


12.97823 


173.69908 


ACOUNT/MINS_OUT_reg[0] 


12.97306 


173.68782 


AC0UNT/MINS_0UT_rcg[4] 


12.98429 


172.32205 


AC0UNT/MINS_0UT_reg[5] 


12.97466 


172.30254 



(5 cells) 
.EC 

-SEE ALSO" 
10 set_switchmg_activity (2); 

power_kccpJiceDse_afterj)ower_comniaiids (3). 

NAME "set_switdiing_activity" 

Sets (or resets) switching acthrity information (toggle_rate, 
IS static_probability) for nets, pins or ports of tbe design. 

SYNTAX 

int set_switching_activity [-static j>robability sp^value] 
[-toggle_rate tr^value] [-period period_value | -clock clock_name] 
20 object^list 

float sp_vaiue 
float tr_Yalue 
float period_valiie 
25 string clodcjiame 

list object-list 



ARGUMENTS 

-static jprobability sp_value 
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Indicates the probability that the signal is in the logic 1 (high) 
state, sp_value is a floating point number that specifies the percentage of 
tune that the signal is in the logic 1 state. For example, an sp_value of .25 
indicates that the signal is in the 
5 logic 1 state 25% of the time. If this option is not specified, then no 

value will annotated and report j)ower will assume a value of 0.5. 

-toggle^rate tr^value 

Specifies the toggle rate; that is, the number of 0-> 1 AND 1- >0 
transitions that the signal makes during a period of time. The period can 

10 be specified with the -clock option (in which case the clock's base period 
will be used) or widi the -period option (in case which case period^value 
will be used as the signal's period). tr_value can be any positive floating 
point number. K ttiis option is not specified, then the toggle rate will not 
be annotated and repoit_power will assume a value of 2*sp(l-sp)*fclk. 

15 fcDc 

represents the frequency of the signal's related clock (if one can be 
determined). If a related clock cannot be determined, the highest-activity 
clock in the design will be used to scale the toggle_rate of this net. 
-period period_vaIue 
20 Specifies the time period in which the toggle rate tr_value occurs; 

usually die simulation time or the clock period. The units of time are that 
of the tedmology library (typically ns). If neither -clock nor -period is 
specified, a period_value of 1 time unit is assumed, -period and -clock are 
mutually exclusive, 
25 -clock cIock_name . 

Specifies the clock object to which tr_vahie is related. The provided 
clock object must have ahready been created using create_clock. The 
period of clock__name is divided mto the toggle rate tr_value to calculate 
the internal absolute toggle rate. If neither -clock nor -period is specified. 
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a period_vaIue of 1 time unit is assumed, -period and clock are mutually 
exclusive. 

DESCRIPTION 

Sets switching activity information (toggle_rate, static jprobability) for 
nets, pins or ports of the design, report j>ower uses this information to 
calculate dynamic power values. The toggle^rate and static jprobability 
should be defmed for all inputs of a design in order to achieve accurate 
results firom the report_power dynamic analysis. If the 
set_switching_activity command is used without any options, then Ae 
switchuag activity attributes for the specified nets will be reset 
(uninitialized). For details about power reports, refer to the report jpower 
c nrnmand man page. 

EXAMPLES 

The following example shows a simulation period of 1320 in which 
33 net toggles were recorded. A static probability of .015 is set. Note that 
the internal toggle rate computed is (toggle_rate/clockj)eriod = 33/1320 = 
.025). 

dc_shell > set_switching_activity -period 1320 -togglejate 33 
-static jrob 0.015 
all_iipitsO 

The following exan^)le shows how the same values can be set usmg 
the -clock option. 

The example assumes that a clock named CLK has been created with 
a clock period of 20. Note that the in^mal toggle rate conq)uted is 
(toggle^ rate/clock j)eriod = .5/20 = .025). 

dc^shell > set_switching_activity -clock CLK -toggle_rate .5 
-static j)rob 0.015 all_ii5)uts0 
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The following example shows the use of set_switching_activity to set 
activities on internal nets in the design by referencing a pin. Typically, 
this is the best way to back-annotate simulation toggle rate mformation. 

dc^shell > set_switchmg-activity -clock CLK -toggle_rate .005 
5 fmd(pin/ASM/CURRENTJTATE_reg[0]/QZ") 

"SEE ALSO" 

create_clock (2), 

rcportjower (2). 

10 

Sample Input 

/* Indicates synthesis library which contains cell models */ 
link^libraiy = power_COM_MAX.db 
/* Read in Conq)iled Gate Level Design Database 
15 read onehot_gated_compiled.db 

/* Define Clock Object ♦/ 
create j:lock dk -period 20 
set Joad 1.03 all^outputsO 

20 

/* Reads list of commands whidi set port toggle Activity */ 

include port_toggle.scr 

25 /♦ Report's power usmg probabilistic propagation */ 
report j)ower 

reportjpower -net -cumulative -sortjnode net_switching jpower -nwOTSt 20 
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/* Report power by cell with histogram */ 

report j)ower cell -cumulative -sort^mode dynamic-power -nworst 20 
-histogram 

5 

/♦ Report power by cell with -flat (thru hierarchy) */ 



report jpower -cell -flat 

10 /»= = = = = = =:== = ==^ === = = ==== = = = = = = = = = = =*/ 

♦/Include Simulation Toggles for Some Internal Nets */ 
include partial_sim_toggle.scr 

15 

/* Report power using hybrid mixture of sinmlation and probabilistic 

propagation */ 

report j)0wer -net -nworst 10 

20 /*= = = = = = = = = = = = = ==== = = === = = = = = = = = = = =*/ 

/♦Include Simulation Toggles for Some Internal Nets ♦/ 

/♦=^ = := = = = = = =:« = = = « = = = == = = = «=: = = = = = = = = =*/ 

inchide sim_toggle.scr 

25 

/♦ Report power usmg hybrid mixture of simulation annotation only */ 
report jower -net -nworst 10 



quit 
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Sample Output 

Behavioral CompUer (TM) 
DC Professional (TM) 

DC Expert (TM) 
ECL Compiler (TM) 
FPGA CompUer (TM) 
VHDL CompUer (TM) 
HDL Compiler (TM) 
libiary Compiler (TM) 

Test Compiler (TM) 
Test Con^iler Plus (TM) 
CTV-Interface 
DesignWare Developer (TM) 
DesignTnne (TM) 
DesignPower (TM) 



Version v3.3a-sloi3a - Feb 27, 1995 
Copyright (c) 1988-1995 by Synopsys, Inc. 
ALL RIGHTS RESERVED 

This program is proprietary and confidential information of Synopsys, 
Inc. and may be used and disclosed only as authorized in a license 
agreement controlling such use and disclosure. 

Initializing... 

/* Indicates synthesis library which contains cell models */ 
link_libraiy = power_COM_MAX.db 
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{"power_COM_MAX.db"} 

/* Read in Compiled Gate Level Design Database */ 

5 read onehot^ated_compiled.db 

Loading db file 7iemote/rd24/smeier/design/power/tutorial/onehot 
gated_conq)iled.db 

Current design is now Vremote/rd24/smeier/design/power/tutorial/ 
10 onehot_gated_compiled.db:ONEHOT_gated* 
{"OhfEHOT^atcd"} 

/* Define Clock Object */ 
create_clock elk -period 20 
15 Loading db file Vam/remote/dacl/Power_Demo/lib/power_COM_ 
MAX,db' 

Infonnation: Updating technology library (please save) ... (UILr34) 
Loading db file Vremote/src/syn/ice/dev/libraries/syn/gtech.db' 
Loading db file 7remote/src/5yn/ice/dev/libraries/syn/standard.sldb' 
20 Performing create_clock on port 'elk'. 
1 

set Joad 1.03 all^outputsO 

Performing set_load on port 'count[lS]\ 
25 Performing set Joad on port 'count[14]\ 
Performing seMoiad on port 'count[13]*. 
Performing setjoad on port 'count[12]\ 
Performing set_load on port 'count[ll]*. 
Performing setjoad on.port 'comit[10]*. 
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Performing setjoad on port *count[9]*. 
Performing setjoad on port *count[8]'. 
Performing setjoad on port 'comit|7]\ 
Performing setjoad on port *count[6]'. 
5 Performing setjoad on port *count[5]\ 
Performing setjoad on port 'couni[41\ 
Performing setjoad on port 'comit[31*. 
Performing setjoad on port 'comit[2]\ 
Performing setjoad on port 'comit[ll*. 
10 Performing setjoad on port •count[01\ 
1 

/* Reads list of commands which set port toggle Activity */ 
include portjoggle.scr 

15 

set_switching_activity -period 340 -toggle_rate 1 -static jrob 0.944444 
fmdOport, "reset"); 

Performing set_switching_activity on port 'reset*. 
1 

20 

set_switching-activity -period 340 -toggle^rate 1 -static_piob 0.5 find 
(port,"gate'0: 

Performing set_switching-activity on port 'gate*. 
25 1 

set_switching_activity -period 20 -toggle_rate 2 -staticjprob 0.5 find 
(port, "elk"); 

Performing set^switching-activity on port *clk'. 
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1 



1 



/* Report's power using probabUistic propagation */ 
5 report_power 

Information: Updating design informadon... (UID-8S) 
Performing probabilistic propagation through design. 

10 Report: power 



-analysis_effort low 



Design : ONEHOTjgated 
Version: v3.3a-slot3a 



Date : Wed Mar 1 20:45:52 1995 



15 



library(s) Used: 



20 



power_COM_MAX.db (File: /am/remote/dacl/Power_Demo/ 
lib/power_^COM_MAX.db) 



Operating Conditions: 

Wire Loading Model Mode: enclosed 



25 Design 



Wife Loadhig Model 



Library 



ONHIOT_gated 0.5K_.TLM powcr.COM^MAX.db 



Global Operating Voltage = 4.75 
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Power-specific unit infonnation: 
Voltage Units = IV 
Capacitance Units = 50.029999ff 
Time Units = Ins 
5 Dynamic Power Units = lOuW (derived ftom V,C,T miits) 

Leakage Power Units = InW 

CeU Internal Power = 300.2616 uW (24%) 
Net Switching Power = 955,5177 uW (76%) 
10 

Total Dynamic Power = 1.2558 mW (100%) 
CeU Leakage Power = 18.0000 nW 

1 . 

report jower -net -cumulative -sortjnode net_switching jpowcr -nworst 20 
Report: power 

-net 

20 -analysis_effort low 

-nworst 20 
-cumulative 

-sort_mode net^switchiog jower 

25 Design: ONEHOT_jated 
Version: v3.3a-slot3a 
Date : Wed Mar 1 20:45:53 1995 
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libtary(s) Used: 

power_COM_MAX.db (File: /ain/remote/dacl/Power_Demo/ 
lib/power_COM_MAX.db) 

Operating Conditions: 

Wire Loading Model Mode: enclosed 

Design Wire Loading Model Libnuy 

10 

ONEHOT_gated 0.5K_TLM power_COM_MAX.db 

Global Operating Voltage = 4.75 
Fower-spedfic unit iitfbrmation: 
15 Voltage Units « IV 

Capacitance Units = 50.029999fF 

Time Units = Ins 

Dynamic Power Units = lOuW (derived £rom VC,T units) 
Leakage Power Units = InW 

20 

Attributes 



a - Switching activity information annotated on 

25 J?otaI Static Toggle Switching 

Net Net Loan Prob. Bate Power Attrs 



gated_clock 21.730 0.250 0.0515 63.1248 

clkb 2.624 0.500 0.1000 14.8124 a 
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1 vow 14^ 






0.0029 


5.4082 a 




i.yUo 


A tAI 

0.101 


0.0047 


1.0379 


counuDxzx 


3.908 


0.095 


0.0044 


0.9795 


C0UI1U^X3X 


3.908 


0.090 


0.0042 


0.9305 


count35x4x 


3.908 


0.085 


0.0040 


0.8836 


count35x5x 


3.908 


0.080 


0.0038 


0.8388 


C0UDt3Sx6x 


3.908 


0.076 


0.0036 


0.7961 


coiint35x7x 


3.908 


0.072 


0.0034 


0.7553 


count35x8x 


3.908 


0.068 


0.0032 


0.7164 


couiit35x9x 


3.908 


0.064 


0.0031 


0.6793 


count35xl2x 


3.908 


0.060 


0.0029 


0.6441 


countSSxlQx 


3.908 


0.060 


0.0029 


0.6440 


count35xl3x 


3.908 


0.057 


0.0028 


0.6105 


countSSxllzx 


3.908 


0.057 


0.0028 


0.6104 


count35xl4x 


3.908 


0.054 


0.0026 


0.5785 


countSSxlSx 


3.908 


0.051 


0.0025 


0.5481 


countSSxQx 


3.908 


0.048 


0.0024 


0.5192 


gateb 


2.614 


0.500 


0.0029 


0.4340 a 



20 Totals (20 nets) 955.5177 uW. 

Cumulative Cumulative 
Transitive Fanin Tiansitive Fanout 
Net Power Power 
25 ~ 



gated_clock 



79.91534 64.66894 
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clkb 


14.81240 


14.81240 


resetb 


5.40824 


5.40824 


co\mt35xlx 


2.84542 


2.84542 


coiuit3Sx2x 


2.78174 


2.78174 


count3Sx3x 


2.72829 


2.72829 


couiit35x4x 


2.67717 


2.67717 


countSSxSx 


2.62832 


2.62832 


couiit35x6x 


2.58167 


2.58167 


couiit35x7x 


2.53717 


2.53717 


count3Sx8x 


2.49474 


2.49474 


co\mt3Sx9x 


2.45430 


2.45430 


coxmt3Sxl2x 


2.41597 


2.41597 


countSSxlQii - 


2.41579 


2.41579 


couiit35xl3x 


2.37930 


2.37930 


countSSxllx 


2.37913 


2.37913 


couiit35xl4x 


2.34441 


2.34441 


coiu)t3SxlSx 


2.31124 


2.31124 


countSSxOx 


2.27970 


2.27970 


gateb 


0.43400 


0.43400 



20 — : ~ 

(20 n^) 
1 

/♦ Report power by cell with histogram ♦/ 

25 report j)ower -cell -cumulative -sortjmode dynamic j)ower -nworst 20 
histogram 



wo 95/34036 



PCT/US95/07040 



-94- 

Report: power 

-ceU 

-analysis^effort low 
5 -nworst 20 

-cumulative 
-histogram 

-sort^mode dynamic_j)ower 

10 Design: ONEHOT^gated 
Version: v3.3a-slot3a 
Date : Wed Mar 1 20:45:53 1995 

15 Iibrary(s) Used: 

powcr_COM_MAX.db (File: /am/iemotc/dacl/Power_^Dcmo/ 
lib/power_COM_MAX.db) 

20 Operating Conditions: 

Wire Loading Model Mode: enclosed 



Design Wire Loading Model Libraiy 
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ONEHOT^ated 0.5K_TLM power^COM_MAX.db 

Global Operating Voltage = 4.75 
5 Power-specific unh infonnation: 
Voltage Units = IV 
Capacitance Units = 50.029999ff 
Time Units = Ins 

Dynaniic Power Units = lOuW (derived from V,C,T units) 
10 Leakage Power Units = InW 

Attributes 



h - Hierarchical cell 
15 Cell Driven Net Tot Dynamic Cell 

Internal Switching Power Leakage 





Cell Power Power 


(% Cell/Tot) Power Attn 




20 


U33 1.5441 


63.1248 


64.669(2%) 


2.0000 




COUNT_REGX0X 1.8075 


1.0379 


2.845(64%) 


1.0000 




C0UNT_REGX1X 1.8022 


0.9795 


2J82(65%) 


1.0000 




C0XJNT_REGX2X 1.7978 


0.9305 


2.728(66%) 


1.0000 




C0UNT_REGX3X 1.7935 


0.8836 


2.677(67%) 


1.0000 


25 


C0UNT_REGX4X 1.7895 


0.8388 


2.628(68%) 


1.0000 




C0UNT_REGX5X 1.7856 


0.7961 


2.582(69%) 


1.0000 




C0UNT_REGX6X 1.7819 


0.7553 


2.537(70%) 


1.0000 




C0UNT_REGX7X 1.7784 


0.7164 


2.495ai%) 


1.0000 




C0UNT_REGX8X 1.7750 


0.6793 


2.454(72%) 


1.0000 



r. 

WO 95/34036 
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C0UNT_REGX11X 1.7718 


0.6441 


2.416(73%) 


1.0000. 


C0UNT_REGX9X 1.7718 


0.6440 


2.416(73%) 


1.0000 


C0UNT_REGX12X 1.7688 - 


0.6105 


2.379(74%) 


1.0000 


COUNT_REGX10X 1.7688 


0.6104 


2.379(74%) 


1.0000 


C0UNT_REGX13X 1.7659 


0.5785 


2.344(75%) 


1.0000 


C0UNT_REGX14X 1.7631 


0.5481 


2.311(76%) 


1.0000 


C0UNT_REGX15X 1.7605 


0.5192 


2.280(77%) 


1.0000 



10 



Totals (17 cells) 300.262uW 
IS.OOOnW 



748.971uW 1.049mW(29%) 



Cumulative Cumulative 
Transitive Fanin Transitive Fanout 



15 



20 



25 



Cell 



Power 



Power 



U33 


79.91534 


64.66894 


COUNT_REGX0X 


2.84542 


2.84542 


C0UNT_REGX1X 


2.78174 


2.78174 


C0UNT_REGX2X 


2.72829 


2.72829 


C0UNT_REGX3X 


2.67717 


2.67717 


C0UNT_REGX4X 


2.62832 


2.62832 


C0UNT_REGX5X 


2.58167 


2.58167 


C0UNT_REGX6X 


2.53717 


2.53717 


C0UNT_REGX7X 


2.49474 


2.49474 


C0UNT_REGX8X 


2.45430 


2.45430 


C0UNT_REGX11X 


2.41597 


2.41597 


C0UNT_REGX9X 


2.41579 


2.41579 


C0UNT_REGX12X 


2.37930 


2.37930 
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C0UNT_REGX13X 
C0UNT_REGX14X 
COUNT REGX15X 



(17 cells) 



Number of Cells 
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2.37913 2.37913 
2.34441 2.34441 
2.31124 2.31124 
2.27970 2.27970 



10 



15 



I 
I 

I* 



I I 



*** * 

+ +. 



I i I i I 



-+- 



20 



1.544 1.589 1.633 1.678 1.723 1.767 1.812 
Cell Internal Power (lOuW) 

(17 Cells) 
1 



/* Rq>ort power by cell with -flat (thru hierardrjr)*/ 



25 repoit_power -cell -flat 

***♦*******•*♦******•♦******♦*♦♦♦♦**♦**♦♦**»♦♦♦**♦***♦♦♦*♦**♦ 

Report: power 

-ceU 
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-analysis_effort low 
-flat 

-sort_mode cell_intenia]_power 

5 Design: ONEH0T_gated 
Version: v3.3a-slot3a 
Date : Wed Mar 1 20:45:54 1995 

10 

Libraiy(s) Used: 

power_COM_MAX.db (File: /am/reinote/dacl/Power_Demo/ 
lib/power_<:»M_MAX.db) 

Operating Conditions: 

Wire Loading Model Mode: enclosed 

Design Wire Loading Model Library 
20 — — 

ONEH0T_gated 0.5K_TLM power_CX)M_MAX.db 

Global Operating Voltage = 4.75 
Power-specific unit ioformation: 
25 Voltage Unhs = IV 

Capacitance Units - 50.029999ff 

Time Units = Ins 

Dynamic Power Units = lOuW (derived fixMn V.C.T units) 
Leakage Power Units = InW 
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Attributes 



h - Hierarchical cell 

Cell Driven Net Tot Dynamic Cell 
5 Internal Switching Power Leakage 



Cell Power Power 


(% Cell/Tot) 


Power Attrs 




COUNT_REGX0X 


1.8075 


1.0379 


2.845(64%) 


1.0000 


C0UNT_REGX1X 


1.8022 


0.9795 


2.782(65%) 


1.0000 


C0UNT_REGX2X 


1.7978 


d 9305 


2.728(66%) 


1.0000 


COUNT_REGX3X 


1.7935 


0.8836 


2.677(67%) 


1.0000 


C0UNT_REGX4X 


1.7895 


0.8388 


2.628(68%) 


1.0000 


COUNT_REGX5X 


1.7856 


0 7961 


2.582(69%) 


1.0000 


COXJNT_REGX6X 


1.7819 


0.7553 


2.537(70%) 


1.0000 


C0UNT_REGX7X 


1.7784 


0.7164 


2.495ai%) 


1.0000 


C0UNT_REGX8X 


1.7750 


0.6793 


2.454(72%) 


1.0000 


C0UNT_REGX11X 


1.7718 


0.6441 


2.416(73%) 


1.0000 


C0UKT_REGX9X 


1.7718 


0.6440 


2.416(73%) 


1.0000 


C0UNT_REGX12X 


1.7688 


0.6105 


2.379(74%) 


1.0000 


COUNT_REGX10X 


1.7688 


0.6104 


2.379(74%) 


1.0000 


C0UNT_REGX13X 


1.7659 


0.5785 


2.344(75%) 


1.0000 


C0UNT_REGX14X 


1.7631 


0.5481 


2.311(76%) 


1.0000 


C0UNT_REGX15X 


1.7605 


0.5192 


2.280(77%) 


1.0000 


U33 


1.5441 


63.1248 


64.669(2%) 


2.0000 


Totals (17 cells) 


30p.262aW 


748.971uW 


1.049mW(29%) 



18.000nW 
1 
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5 /*Iiiclude Simulation Toggles for Some Internal Nets */ 
/ 



10 



include partial_sim_toggle.scr 
set_switchjng_activity -period 340 -toggle_rate 2 
find(pin,"C0UNT_REGX2X/ Q"); 



Perfonning set_switching_activiQr on pin 'COXJNT REGXIX/Q'. 
1 

15 set_switching_activity -period 340 -toggle_rate 2 
find(pin, •C0UNT_REGX2X/ 
QZ-); 



Perfonning set_switching_activity on pin 'COUNT REGOQX/OZ* 
20 1 " 

set_switching_activity -period 340 -toggie.rate 2 find(pin."CX)UNT 
REGXIX/Q"); 



Perfonning set_switching activiQr on pin 'COUNT REGXlX/0' 
25 1 " 



set_switching_activity -period 340 -toggle^rate 2 find(pin. "COUNT 
REGXIX/QZ"); 
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Performing set_switching_activity on pin 'COUNT^REGXIX/QZ'. 
1 

set_switchmg_activity -period 340 -toggle_rate 1 find(pin,TOUNT_ 
REGXOX/Q"); 

5 

Performing set_switching_activity on pin 'COUNT_REGX0X/Q'. 
1 

set_switching_activity -period 340 -toggle^rate 1 
10 fmd(pin/COUNT_REGX0X/ QZ"); 

Performing set^switching^activity on pin •COUNT_REGX0X/QZ\ 

1 

1 

15 

/♦Report power using hybrid mixture of simulation aiKi probabilistic 
propagation*/ 

reportj)ower -net -nworst 10 
Information: Updating design information... (UID-85) 
20 Performing probabilistic propagation through design. 

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦^^ 

Report: power 

-net 

25 -analysis_effort low 

-nworst 10 

-sort_mode net_switching j)ower 



Design : ONEHOT jgated 
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Version: v3.3a-slot3a 

Date : Wed Mar 1 20:45:59 1995 

5 library (s) Used: 

powcr_COM_MAX.db (Ffle: /am/remote/dacl/Power_Demo/ 
lib/power_COM_MAX.db) 

10 Operating Conditions: 

Wire Loading Model Mode: enclosed 

Design Wire Loading Model Ubraiy 

15 ONEHOT^ated 0.5K_TLM power_COM_MAX.db 

Global Operating Voltage = 4.75 
Power-specific unit information: 
Voltage Units = IV 
20 Capacitance Units = 50.029999ff 

Time Units Ins 

Dynamic Power Units = lOuW (derived ftom V.C.T units) 
Leakage Power Units = InW 

25 Attributes 



- Switching activity information annotated on net 
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Total Static Toggle Switching 
Net Net Loan Prob. Rate Power Attrs 



gated_cIodc 


21.730 


0.250 


0.0515 


63.1248 


cUcb 


2.624 


0.500 


0.1000 


14.8124 a 


res^ 


32.580 


0.944 


0.0029 


5.4082.a 


count3Sx4x 


3.908 


0.472 


0.0128 


2.8336 


count3SxSx 


3.908 


0.446 


0.0127 


2.8090 


count35x6x 


3.908 


0.421 


0.0126 


2.7714 


count35]C7x 


3.908 


0.398 


0.0123 


2.7231 


count3Sx8x 


3.908 


0.376 


0.0121 


2.6661 


countSSxSx 


3.908 


0.355 


0.0118 


2.6021 


countSSxlOx 


3.908 


0.335 


0.0115 


2.5325 



15 Totals (10 ne^) 1.0228inW 
1 

/ . 

*= = ======== = ========= = = = ==== = = = = = ==•/ 

20 /^Include Simulatioii Toggles for Some Internal Nets */ 
/ 

include partial_sim_toggle.sa- 
25 set_switc!iing_activiQr -period 340 -toggle_rate 2 find^in, "COUNT_ 
REGX2X/Q"); 



Performing set_switching_activity on pin •C0UNT_REGX2X/Q*. 
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1 

set_switdimg_activily -period 340 -toggle_rate 2 fiiui(pm,"COUNT_ 
REGX2X/QZ"); 

Performing set_switching_activity on pin 'C0UNT_REGX2X/QZ'. 
1 

set_switching_activity -period 340 -toggle__rate 2 fmd(pin, "COUNT 
REGXIX/ Q"); 

Performing set_switching_activity on pin 'COUNT REGXIX/Q*. 
1 

set_switching_activity -period 340 -toggle_rate 2 
find(pin,"COUNT_FEGXlX/ QZ"); 

Performing set_switchin^activity on pin *C0UNT_REGX1X/QZ'. 
1 

set_switdiing_activity -period 340 -toggle rate 1 
find(pin,"COUNT_REGX0X/ Q"); 

Performing set_switching_activity on pin 'COUNT_REGX0X/Q'. 
1 

set_switdiing_activity -period 340 -toggle_rate 1 
find(pin,*COUNT_REGX0X/ QZ"); 

Performing set_switdiing^activity on pin 'COUNT;_REGX0X/QZ*. 
1 

s«_switching_activity -period 340 -toggle_rate 0 find(pin,"COUNT_ 
REGX15X/Q"); 
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Perfonning set_switching_activity on pin •C0UNT_REGX15X/Q'. 
1 

set_switching_activity -period 340 -togglejate 0 find(pin,"COUNT_ 
KEGX15X/QZ"); 

Perfonning set_switching_activity on pin ♦C0UNT_REGX15X/QZ*. 
1 

set_switdiing_activity -period 340 -toggle_rate 0 find(pin,"COUNT_ 
REGX14X/Q"); 

Performing set_switching_activity on pin •C0UNT_REGX14X/Q'. 
1 

set_switdiing_activity -period 340 -toggle_rate 0 fmd(pin,"COUNT_ 
REGX14X/QZ"); 

Performing set_switching_activity on pin 'C0UNT„REGX14X/QZ'. 
1 

set_switchii]g_activity -period 340 -toggle_rate 0 find(pin,"CX)UNT_ 
REGX13X/Q"): 

Performing set_switching_activity on pin 'COUNTREGXISX/Q*. 
1 

set_switcliing_acdvity -period 340 -toggle_rate 0 find(pin,"COUNT_ 
REGX13X/QZ"); 

Peifoiming set_switcbing_activity on pin *C0UNT_REGX13X/QZ'. 
1 

set_switching_activity -period 340 -toggle_rate 0 find(pin,"COUNT_ 
REGX12X/Q"); 
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Performing set switching activity on pin 'C0UNT_REGX12X/Q'. 
1 

set_switching_activity -period 340 -toggle_rate 0 fmd(pin,"COUNT_ 
REGX12X/QZ"); 

5 

Perfonning set_switching_activity on pin 'C0UNT_REGX12X/QZ*. 
1 

set_switcliing_activity -period 340 -toggle_rate 1 find(pin,"COUNT_ 
REGXllX/Q"); 

10 

Performing set_switehing_activity on pm 'C0UNT_REGX11X/Q'. 
1 

set_switching_activity -period 340 -toggle_rate 0 find(pin,"COUNT_ 
REGXllX/QZ"); 

15 

Perfonning set__switdung_activity on pin 'COUNT.REGXllX/QZ'. 
1 

setjswitching_activily -period 340 -toggle_rate 0 find(pin, "COUNT. 
REGXIOX/Q"); 

20 

Perfonning set_switching_activily on pin 'COUNTREGXIOX/Q'. 
1 

set_switching_activity -period 340 -toggle_rate 0 find(pin,"CX)UNT_ 
RR3X10X/QZ"); 

25 

Perfonning set_switching_activity on pin *COUNT_REGX10X/QZ'. 
1 

set_switdung_activily -period 340 -toggle_rate 18 findOpin,"U33/Y"); 
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Perfonning set_switchmg_activity on pin 'U33/Y'. 
1 

set_switching_activity -period 340 -toggle_rate 0 
fmd(pm,"C0UNT_REGX9X/ Q"); 

5 

Perfonning set_switching_activity on pin 'C0UNT_REGX9X/Q*. 
1 

set_switchingjactivity -period 340 -toggle_rate 0 
fmd(pin.-COUNT_REGX9X/ QZ"); 

10 

Performii^ set_switching_activity on pin •C0UNT_REGX9X/QZ'. 
1 

set_switchingjictivity -period 340 -toggle_rate 1 
find(pin."C0UNT_REGX8X/ Q"); 

15 

Perfonning set_switdung_activity on pin *C0UNT_REGX8X/Q'. 
1 

set_switching_activity -period 340 -toggjejate 1 
fuid(pin,"COUNT_REGX8X/ QZ"); 

20 

Perfonning set_switcliing_activity on pin 'COUNT.REGXSX/QZ*. 
1 

set_switching_activity -period 340 -toggle_rate 2 
25 find(pin,-C0UNT_REGX7X/Q"): 

Perfonning set_switcliing_activity on pin •C0UNT_REGX7X/Q'. 
1 
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set_switchmg_activity -period 340 -toggle_iate 2 
fbd(pm,"C0UNT_REGX7X/ QZ"); 

Performing set_switchiiig_activity on pin •C01JNT_REGX7X/QZ'. 
5 1 

set_switching_activity -period 340 -toggle_rate 2 
find(pin,"C0UNT_REGX6X/ Q"); 

Performing set_switching_activity on pin 'COUNT_REGX6X/Q'. 
10 1 

set_switching_activity -period 340 -toggle_rate 2 
find(pin,"COUNT_RB3X6X/ QZ"); 

Performing set_switching_activity on pin *C0UNT_REGX6X/QZ*. 
15 1 

set_switching_activily -period 340 -toggle_rate 2 
find(pin,"COUNT_REGX5X/ Q"); 

Performing set_switching_activity on pin •C0UNT_REGX5X/Q\ 
20 1 

setjswitching_activi^ -period 340 -<oggle_iate 2 
find(pin,"C0UNT_REGX5X/ QZ"); 

Performing set_switcliiiig_activity on pin 'C0UNT_REGX5X/QZ*. 
25 1 

set_switching_activity -period 340 -togglejate 2 
find(pin,-COUNTJREGX4X/ Q"); 

Performing set_swilching_activity on pin •C0UNT_REGX4X/Q'. 
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1 

set_switching_activity -period 340 -toggle_iate 2 
fmd(pin/C0UNT_REGX4X/ QZ"); 

5 Performing set_switchmg_activity on pin 'C0UNT_REGX4X/QZ\ 
1 

set_switching_activity -period 340 -toggle_rate 1 
fmd(pin."COUNT^REGX3X/ Q"); 

10 Perfonning set_switching_activity on pin *C0UNT_REGX3X/Q\ 
1 

set_switching_activity -period 340 -toggle_rate 2 
fmd(pin,"C0UNT^REGX3X/ QZ"); 

15 Perfonning set_switching_activity on pin 'COUNT^REGXSX/QZ'. 
1 

1 

/^Report power using hybrid mixture of simulation annotation only 

20 

reportjpower -n^ -nworst 10 

Information: Updating design information... (UID-8S) 

Perfonning probabilistic propagation through design. 

25 Report: power 

-net 

-analysis_effort low 
-nworst 10 

-sort jnode net_switching j)ower 
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Design: ONEHOT_gated 
Version: v3.3a-slot3a 
Date : Wed Mar 1 20:46:04 1995 
******************************************#**^ 

5 Libraiy(s) Used: 

power_COM_MAX.db (File: /am/iemote/dacl/Power_Demo/ 
lib/power_COM_MAX.db) 

10 Operating Conditions: 

Wire Loading Model Mode: enclosed 

Design Wire Loading Model Library 
15 ONEHOTjgated 0.5K_TLM power_COM_MAX.db 

Global Operating Voltage •= 4.75 
Power-specific unit information: 

Voltage Units = IV 
20 Capacitance Units = 50.029999fif 

Time Units ^ ins 

Dynamic Power Units = lOuW (derived from V.C.T mrits) 
Lealcage Power Units = InW 

25 Attributes 



a - Switching activity information annotated on net 



Total Static Toggle Switching 
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Net Net Loan Prob. 


Rate 


Power Attrs 






gaieu_C10CK 




0.500 


0.0529 


44.0294 a 




cDcb 


2.624 


0.500 


0.1000 


14.8124 a 


5 


resetb 


32.580 


0.944 


0.0029 


5.4082 a 




count3Sx2x 


3.908 


0.500 


0.0059 


1.2974 a 




count35x3x 


3.908 


0.500 


0.0059 


1.2974 a 




count35x4x 


3.908 


0.500 


0.0059 


1.2974 a 




count35x5x 


3.908 


0.500 


0.0059 


1.2974 a 


10 


COTnt35x6x 


3.908 


0.500 


0.0059 


1,2974 a 




count35x7x 


3.908 


0.500 


0.0059 


. 1.2974 a 




count35x8x 


3.908 


0.500 


0.0059 


1.2974 a 


15 


Totals (10 nets) 
1 








942.3093 uW 



quit 
1 

dc_sheU> 

20 Memoiy usage for this session 9025 Kbytes. 
CPU usage for this session 32 seconds. 
Thank you... 



-112- 



1. A computer memory which includes a data stnicture stored therein, 
the data structure comprising: 

an array which includes dements for storing discrete energy values for 
prescribed library cell; 

a collection of pairings of Ubrary cell output capacitance values and 
corresponding library cell weighted average input transition times; and 

a collection of references from individual pairings to individual array 
elements. 

2. A computer memory wMch includes a data structure stored therein, 
the data structure comprising: 

a two dimendonal array which includes elements for storing discrete 
energy values for a prescribed library cell; 

a collection of hTjraiy ceD output capadtance values organized in the 
memory in order of increasing magnitude along a first dimension of the array; 

a collection of library cell weighted average input transition times 
organized in the manory in order of increasing magnitude along a second 
dimension of the array; and 

\s*erdn individual library cell output rapadtance values provide 
references to array elOTients along the first array dimension and individual 
Vhvdxy cell weigjited average transition times provide references to array 
elements along the second array dimenaon. 

3. The memory of claim 1 or 2 \)^erdn the memory fiirther includes: 
a netlist v*ich includes an instantiation of the prescribed library cell; 

a first index into the array provided by a computed output capadtance 
value for the instantiation of the prescribed hl)rary cell in tiie netlist; and 
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a second index into the array provided by a computed weighted average 
input transition time for the instantiation of the prescribed library ceU in the 
netlist. 

4. A method ofselecting a preferred instantiation of a prescribed 
library cell in a netlist based upon the internal pow^ dissipation of the 
prescribed ceD comprismg the steps of: 
traversing the netlist; 

computing an output cs^acitance value for a canent instantiation of the 
prescribed library cell in the netlist; 

computing a weighted average input tran^tion time for the current 
instantiation; 

computing internal power dissipation of the current instantiation of the 
prescribed library cell based upon the computed output capacitance value and 
the computed wdghted average input transition time; 

selecting an alternative instantiation of the library cell in the netlist 
based upon the internal power computation; and 

instantiating the sdected akemative instantiation in the netlist. 

20 S. An improved method for managing the use of electronic memoiy in 

the course of estimating average power consuniption of an electroruc drcuit 
represented as a netlist coinpridng the 5tq)s o£ 

ranking, in the dectronic memoiy, primary outputs of the netlist with 
respect to each other in an order that depends upon the number of logic levels 
25 between respective primary outputs and respective primary inputs that feed into 
such respjective primary outputs; 

performing a depth-first traversal of the netlist, in the electronic 
memory, that follows the primary output ranldng order; and 
in the course of perfomung the depth-first traversal, 
30 constructing, in the electronic memory, a respective binary 

decision diagram (BDD) for each respective netlist node that feeds a first 
primary output of the netlist by constructmg a respective BDD for each 



10 
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respective deeper logic level netlist node feeding such first primary output prior 
to constructing a respective BDD for a respective shallower logic level netBst 
node feeding such first primaiy output; 

computing a respective switching activity value for each 
5 respective constructed BDD; and 

releaang a respective BDD fi-om the electronic memory whoi a 
respective BDD has been constructed for every respective fanout of the netlist 
node associated wth such reqiective released BDD. 

0 6. The method of dam 5, 

wherein the step of constructing produces in the electrornc memory at 
least one respective BDD for a deeper level netUst node that serves as a basis 
for constmction of at least one respective BDD for a shaUower logic level 
netlist node. 

5 

7. The method of daim S, 

wherem the step of constructing produces in the electronic memory a 
first BDD for a deeper levd netiist node that botii serves as a basis for 
construction of a second BDD for a second shallower logic levd netiist node 
0 and also serves as a basis for construction of a tiiird BDD for a tiiird shallower 
logic levd netlist node; 

wherein ti>e step of releaang indudes rdeaang tiie first constructed 
BDD fit)m the dectronic memory when tfie second BDD and tiie tWrd BDD 
have beoi constructed in the dectronic memory. 

5 

8. The metiiod of daim S, 

wherdn the step of constructirig produces in the dectronic memory a 
first BDD for a deeper levd netiist node tiiat both serves as a baas for 
consbuction of a second BDD for a second shallower logic levd netiist node 
) and also serves as a basis for construction of a tiiird BDD for a tiiird shallower 
lo^c levd netiist node; and fiarther including the steps o£ 
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storing in the electronic memory a fanout status for the first deeper 
lo^c level netlist node; and 

in the course of performing the depth-first traversal, 

adjusting the fanout status when the second BDD and the third 
5 BDD are constructed; 

wherein the step of releasing includes releadng the first constructed 
BDD from the electronic memory when the &nout status is adjusted. 

9. The method of claim 5, 

10 wherein the step of constructing produces in the electronic memory a 

first BDD for a deeper lo^c level netlist node that both serves as a basis for 
construction of a second BDD for a second shallower logic level netlist node 
and also serves as a basis for construction of a third BDD for a third shallower 
lo^c level netlist node; and fiirther including the step of: 

15 storing in the dectroiuc mraiory a &nout count for the first deeper 

lo^c level netlist node; and 

in the course of performing the depth-first traversal, 

decrementing the fanout count when the second BDD is 
constructed; and 

20 decrementing the &nout count v/hen the third BDD is 

construaed; 

v^erein the stq) of relea^g includes rdea^g the first constructed 
BDD from the electroiuc memory vAien the fanout count has been twice 
decremented. 

25 

10. The method of daim S, 

v^erein the step of constructing produces in the electronic memory a 
first BDD for a deeper logic level netlist node that both serves as a baas for 
construction of n BDDs for n shallower logic level netlist nodes; and fiirther 
30 including the steps of. 

storing in the electronic memory a fimout count for the first deeper 
logjc level netlist node; and 
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i 

in the course of performing the depth-first traversal, 

decrementing the fanout count each time one of the n BDDs is 

constnicted; 

wherein the step of releasing includes releasing the first BDD fi^om the ' 
electronic memory when the fanout count has been decremented n times. 

1 1 . The method of claim 5 including the steps of: 
in the course of performing the depth-first traversal, 

constructing in the electronic memory a respective BDD for 
each respective netlist node that feeds a second primary output of the nedist by 
constructing a respective BDD for each deepw lo^c level netlist node feeding 
such second primary output prior to constructing a respective BDD for a 
shallower lo^c level netlist node feeding such second primary output, 

12. The m^od of claim 5 including the steps of: 
in the course of performing the depth-first traversal, 

constructing in the electronic memory a respective BDD for 
each respective netlist node that feeds a second primaiy ou^ut of the netlist by 
constructing a respective BDD for each deepa- logic level netlist node feeding 
such second primary output prior to constructing a respective BDD for a 
shallower lo^c level netlist node feeding such second primaiy output, wherein 
a respective BDD constructed for a deeper lo^c level netlist node that feeds 
the second primary output may serve as a basis for construction of a respective 
BDD for a shallower logic level netlist node that feeds the second primary 
output 

13. The method of claim 5 including the steps of: 
in the course of performing the depth-first traversal, 

constructing in the electronic memory a respective BDD for 
each respective netlist node that feeds a second primary output of the netiist by 
constructing a respective BDD for each deeper logic level netlist node feeding 
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such second primary output prior to constructing a respective BDD for a 
shallower lo^c level netlist node feeding such second primary output; and 
constructing in the electronic memory a respective BDD for 
each respective netlist node that feeds a third primary output of the netlist by ; 
5 constructing a respective BDD for each deeper logic level netlist node feeding 
such third primary output prior to constructmg a respective BDD for a 
shallower lo^c level netlist node feeding such third primary output. 

14. The method of claim 5 including the steps of: 
10 in the course of performing tKe depth-first traversal, 

constructing in the electroruc memory a respective BDD for 
each respective netlist node that feeds a second primary output of the netlist by 
constructing a respective BDD for eadi deeper lo^c level netlist node feeding 
such second primary output prior to constructing a respective BDD for a 
15 shallower logic level netlist node feeding such second primary output, vrtierein 
a respective BDD constructed for a deeper logic levd netlist node that feeds 
the second primary output may serve as a basis for construction of a respective 
BDD for a shallower logic level netiist node that feeds the second primary 
output; and 

20 constructing in the electronic memory a respective BDD for 

each respective netlist node that feeds a third primary output of the netlist by 
constructing a respective BDD for eadi deeper logic level netlist iiode feedir^ 
such third primary output prior to constructing a respective BDD for a 
shallower logic levd netlist node feeding such third primary output, wherdn a 

25 respective BDD constructed for a deeper log^c level netiist node that feeds the 
tlurd primary output xriay serve as a basis for construction of a respective BDD 
for a shallower lo^c level netlist node that feeds the third primary output. 

15. The method ofclaim 5 including the step o£ 

30 constructing in the electronic memory a respective BDD for 

each respective netlist node that feeds ia second primary output of the netiist by 
constmcting a respective BDD for each deeper logic level netiist node feeding 



wo 95/34036 



PCT/DS95/D7040 



-118- 

i 

such second primary output prior to constructing a respective BDD for a 
shallower logic level netlist node feedmg such second primary output; 

wherem the steps of constructing produce in the electronic memory a 
first constructed BDD for a deeper level netlist node that both feeds the first 
5 primary input and that also feeds the second primary input and that both serves 
as a basis for a shallower logic level netlist node that feeds the first primary 
input and also serves as a basis for a shallower lo^c level netlist node that 
feeds the second primary irq)ut. 

10 16. The method of claim 5 including the step of: 

constructing in the electronic memory a respective BDD for 
each respective netlist node that feeds a second primary output of the netlist by 
constructing a respective BDD for each deeper logic level netlist node feeding 
such second primary output prior to constructmg a respective BDD for a 

15 shallower logic level netEst node feeding such second primary output; 

whcrdn the steps of constructing produce in the electroruc memory a 
first BDD for a deq)a- level netlist node that both feeds the first primary input 
and that also feeds the second primary input and that both serves as a basis for 
construction of a second BDD for a shallower logic level netlist node that feeds 

20 the first primary input and also serves as a basis for construction of a third 
BDD for a shallower logic level netlist node that feeds the second primary 
input; and 

wherein the step of releasing includes releasing the first constructed 
BDD from the electronic memory v/htn both the second BDD and the third 
25 BDD have been constructed in the electronic memory. 

17. The method of claim 5 wherdn the step of computirig involves 
computing a respective static probability and a respective toggle rate for each 
respective constmcted BDD. 
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18. An improved method for managing the use of electronic memory in 
the course of estimating average power consumption of an electronic circuit 
represented as a netlist comprising the steps of: 

ranking, in the electronic memory, primary outputs of the netlist with * 
respect to each other in an order that depends upon the number of logic levels 
between respective primary ou^uts and respective primary inputs that feed into 
such respective primary outputs; 

p^orming a depth-first traversal of the netlist, in the electronic 
memory, that follows the primary output ranking ord^, and 

in the course of performing the depth-first traversal, 

constructing, m the electronic memory, a respective binary 
dedsion diagram (BDD) for each respective netlist node that feeds a first 
primary output of the netlist by constructing a respective BDD for each 
respective deeper logjc level netlist node feeding such first primary output prior 
to constructing a respective BDD for a respective shallower log^c level n^list 
node feeding such first primary output; 

releasing each respective deeper logic levd BDD fi'om the 
dectronic memory for v^ch a respective shallower logic level BDD has been 
constructed for each respective fanout of a respective netlist node assodated 
with sudi respective released BDD; 

storing in the dectronic memory idoitification of the respective 
fi-ontier BDDs wfaidi are nonreleased BDDs produced in the dectronic 
memory; 

determining whffl the amount of dectronic riiemory used 
exceeds a defined limit; and 

rdeasing.fi^6m the dectronic memory a first frontier BDD lA^ien 
the amount of electronic memory used exceeds the defined limit. 

19. The method of daim 18 including the fiirther stq> of: 
in the course of performing the depth-first traversal, 

substituting a first pseudo-primary input for the first frontier 
BDD released from the electronic memory. 
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20. The method of claim 18 including the step of: 
in the course of performing the depth-first traversal, 

computing a respective switching activity value for each 
respective constructed BDD. 

21 . The method of dmm 1 8 including the step of 

in the course of performing the depth-first traversal, 

computmg a respective TR and a respective SP for each 
respective constructed BDD. 

22. The method of claim 18 wherein said step of determining when the 
amount of electronic memory used exceeds a defined limk involves determining 
wiien the amount of electronic memory occupied by HDDs exceeds the defined 
limit. 



23. A method for estimating average power consumption of an 
electronic drcuit that includes sequential elements represented as a netlist 
stored in an electronic memory comprisir^ the steps of: 

producing in the electronic memory a graph representing the electronic 
20 circuit in which sequential elements are represented as nodes and combinational 
logic elements connections between sequential elements are represented as 
directed arcs; 

removing fi-om the graph a first node that forms part of a cydic path 
witfiin the graph and that represents a first sequential element of the electronic 
25 drcuit; 

produdng in the grjq)h a first source node that represents the first 
sequential element of the electronic circuit; 

producing in the graph a first load node that represents the first 
sequential element of the electronic drcuit; 

producing m the graph a respective corresponding first source arc that 
represents a respective arc output fi-om the removed first node, each respective 
first source arc having the first source node as its origiri and having a 
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destination that is the same as its corresponding arc output of the removed first 
node; 

produdng in the graph a respective corresponding first load arc that 
represents a respective arc input to the removed first node, each respective first. 

5 load arc having the first load node as its destination and having a source that is 
the same as its corresponding arc input to the removed first node; 

grouping the nodes and the arcs of the graph into respective graph 
levels, each corresponding to a respective group of sequential logic of the 
electronic circuit and to a respective group of combinational lo^c of the 

10 electronic drciut that feeds such respective group of sequential lo^c; and 

computing respective switching activity values for nets of the netlist in 
an order presCTibcd by the graph by computing activity values for nets of a 
respective group of nets representing a respectwe group of combinational lo^c 
corresponding to a given graph level, using as respective primary inputs to the 

15 respective group of nets, switcMng activity values computed for another 

respective group of nets representing a respective group of deq)er lo^c level 
combinational lo^c corresponding to a respective deeper graph leveL 

24. The method of daim 23, 
20 Aiy^rdn md step of produdng in the graph a respective corresponding 

first source arc involves changbg the origin of the respective arc output from 
the removed first node so that it becomes an arc output from the first source 
node; and 

\sdierdn said step of produdng in the graph a respective corresponding 
25 first load arc involves changing the destination of the respective arc input to tiie 
r^oved first node so that it becomes an arc input to the first load node. 



30 
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25. The method of claim 23, 

wherem said step of computing involves using as respective primaiy 
inputs to the respective group of nets, switching activity values corresponding 
to the respective deeper graph level immediately below the given graph level. 

5 

26. A method for estimating average power conaimption of an 
electroiuc drcuit that mcludes sequential elemoits represented as a netlist 
stored in an electronic memory compriang the steps of: 

produdng in the dectronic memoiy a graph representing the dectronic ' 
10 circuit in which sequential dements are' rqwesented as nodes and combinational 
logic dements connections between sequential elements are represented as 
directed arcs; 

removing from the graph a first node that fonns part of a qrdic path 
within the graph and that represents a first sequential element of the electronic 
15 drciut; 

produdng in the graph a first source node that represents the first 
sequential dement of the dectronic drcmt; 

producing in the graph a first load node that r^resents the fiiBt 
sequential dement of the electronic drcuit; 
20 producing m the graph a respective corre^onding first source arc that 

represents a respective arc output fi-ora tiie removed first node, each respective 
first source are havmg the first source node as its origm and having a 
destination that is the same as its corresponding arc output of the removed first 
node; 

25 produdng in the grqih a respective correspondiBsg first load arc that 

represerts a respective arc input to the removed first node, each respective first 
load arc having the first load node as its destination and having a source that is 
the same as its corresponding arc input to the iCTioved first node; 

grouping the nodes and the arcs of the gn^h mto respective gr^h 
30 levds, each corresponding to a respective group of sequential logic of the 
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electronic circuit and to a respective group of combinational logic of the 
electronic circuit that feeds such respective group of sequential lo^c; and 

computing respective switching activity values for nets of the netlist in 
an order prescribed by the graph levels such that computation of respective 
S switching activity values for given logic corresponding to a given graph levd 
uses a switching activity of a prior node computed from prior logic 
corresponding to a prior graph level as a ba^s for a primaiy input to such given 
log^c. 

10 27. The method of claim 26 herein said step of computing involves 

computing such that a primary output of such giv^ logic is used as a basis for 
a primary input to subsequent logic correspon(£ng to a subsequent graph level. 
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